System for Accessing Distributed Data Cache Channel at Each Network Node to Pass Requests and Data

ABSTRACT

A cache apparatus for a network receives and responds to network file-services-protocol requests from client workstations coupled to the network. The cache apparatus includes a digital memory for storing data transmitted in responding to the network requests. A processing unit executes program instructions. A network interface couples the cache apparatus to the network. The interface includes program instructions, executed by the processing unit, for receiving the requests and transmitting responses thereto. A file-request-service module includes program instructions, executed by the processing unit, for interpreting the requests and generating responses thereto. The file-request-service module also checks the memory for the presence of an image of data specified by the request. When the data is present, the file-request-service module retrieves the data for inclusion in the response. A file-request-generation module includes program instructions, executed by the processing unit, for storing data received from the network and for generating requests for data that the file-request-service module determines to be missing from the memory. The network interface transmits file-request-generation module requests to network.

This application is a continuation of U.S. patent application Ser. No.10/794,723 filed Mar. 4, 2004, which is a continuation of Ser. No.10/291,136 filed Nov. 9, 2002, that issued Oct. 12, 2004, as U.S. Pat.No. 6,804,706; which is a continuation of Ser. No. 09/760,258 filed Jan.13, 2001, that issued Jan. 7, 2003, as U.S. Pat. No. 6,505,241 B2; whichis a continuation of Ser. No. 09/382,311 filed Aug. 24, 1999 now U.S.Pat. No. 6,205,475; which is a continuation of Ser. No. 09/144,602 filedAug. 31, 1998, that issued on Feb. 15, 2000 as U.S. Pat. No. 6,026,452;which is a division of Ser. No. 08/806,441 filed Feb. 26, 1997, thatissued Apr. 6, 1999, as U.S. Pat. No. 5,892,914; which is a division ofSer. No. 08/343,477 filed Nov. 28, 1994, that issued Mar. 11, 1997, asU.S. Pat. No. 5,611,049, and that claimed priority under 35 U.S.C. §371from Patent Cooperation Treaty (“PCT”) International. Patent ApplicationPCT/US92/04939 filed Jun. 3, 1992. The entire contents of each of theabove-listed applications is incorporated herein by reference.

TECHNICAL FIELD

The present invention relates generally to the technical field ofmulti-processor digital computer systems and, more particularly, tomulti-processor computer systems in which:

-   -   1. the processors are loosely coupled or networked together;    -   2. data needed by some of the processors is controlled by a        different processor that manages the storage of and access to        the data;    -   3. processors needing access to data request such access from        the processor that controls the data;    -   4. the processor controlling data provides requesting processors        with access to it.

BACKGROUND ART

Within a digital computer system, processing data stored in a memory;e.g., a Random Access Memory (“RAM”) or on a storage device such as afloppy disk drive, a hard disk drive, a tape drive, etc.; requirescopying the data from one location to another prior to processing: Thus,for example, prior to processing data stored in a file in acomparatively slow speed storage device such as hard disk, the data isfirst copied from the computer system's hard disk to its much higherspeed RAM. After data has been copied from the hard disk to the RAM, thedata is again copied from the RAM to the computer system's processingunit where it is actually processed. Each of these copies of the data,i.e., the copy of the data stored in the RAM and the copy of the dataprocessed by the processing unit, can be considered to be image of thedata stored on the hard disk. Each of these images of the data may bereferred to as a projection of the data stored on the hard disk.

In a loosely coupled or networked computer system having severalprocessors that operate autonomously, the data needed by one processormay be accessed only by communications passing through one or more ofthe other processors in the system. For example, in a Local Area Network(“LAN”) such as Ethernet on of the processors may be dedicated tooperating as a file serve that receives data from other processors viathe network for storage on its hard disk, and supplies data from itshard disk to the other processors via the network. In such networkedcomputer systems, data may pass through several processors in beingtransmitted from its source at one processor to the processor requestingit.

In some networked computer systems, images of data are transmitteddirectly from their source to a requesting processor. One operatingcharacteristic of networked computer systems of this type is that, asthe number of requests for access to data increase and/or the amount ofdata being transmitted in processing each request increases, ultimatelythe processor controlling access to the data or the data transmissionnetwork becomes incapable of responding to requests within an acceptabletime interval. Thus, in such networked computer systems, an increasingworkload on the processor controlling access to data or on the datatransmission network ultimately causes unacceptably long delays betweena processor's request to access data and completion of the requestedaccess. In an attempt to reduce delays in providing access to data innetworked computer systems, there presently exist systems that projectan image of data from its source into an intermediate storage locationin which the data is more accessible than at the source of the data. Theintermediate storage location in such systems is frequently referred toas a “cache,” and systems that project images of data into a cache arebe referred to as “caching” systems.

An important characteristic of caching systems, frequently referred toas “cache consistency” or “cache coherency,” is their ability tosimultaneously provide all processors in the networked computer systemwith identical copies of the data. If several processors concurrentlyrequest access to the same data, one processor may be updating the datawhile another processor is in the process of referring to the data beingupdated. For example, in commercial transactions occurring on anetworked computer system one processor may be accessing data todetermine if a customer has exceeded their credit limit while anotherprocessor is simultaneously posting a charge against that customer'saccount. If a caching system lacks cache consistency, it is possiblethat one processor's access to data to determine if the customer hasexceeded their credit limit will use a projected image of the customer'sdata that has not been updated with the most recent charge. Conversely,in a caching system that possesses complete or absolute cacheconsistency, the processor that is checking the credit limit isguaranteed that the data it receives incorporates the most recentmodifications.

One presently known system that employs data caching is the BerkeleySoftware Distribution (“BSD”) 4.3 version of the Unix timesharingoperating system. The BSD 4.3 system includes a buffer cache located inthe host computer's RAM for storing projected images of blocks of data,typically 8 k bytes, from files stored on a hard disk drive. Before aparticular item of data may be accessed on a BSD 4.3 system, therequested data must be projected from the hard disk into the buffercache. However, before the data may be projected from the disk into thebuffer cache, space must first be found in the cache to store theprojected image. Thus, for data that is not already present in a BSD 4.3system's buffer cache, the system must perform the following steps inproviding access to the data:

-   -   Locate the buffer in the RAM that contains the Least Recently        Used (“LRU”) block of disk data,    -   Discard the LRU block of data which may entail writing that        block of data back to the hard disk. ∘ Project an image of the        requested block of data into the now empty buffer,    -   Provide the requesting processor with access to the data.

If the data being accessed by a processor is already present in a BSD4.3 system's data cache, then responding to a processor's request foraccess to data requires only the last operation listed above. Becauseaccessing data stored in RAM is much faster that accessing data storedon a hard disk, a BSD 4.3 system responds to requests for access to datathat is present in its buffer cache in approximately 1/250th the timethat it takes to respond to a request for access to data that is notalready present in the buffer cache.

The consistency of data images projected into the buffer cache in a BSD4.3 system is excellent. Since the only path from processors requestingaccess to data on the hard disk is through the BSD 4.3 system's buffercache, out of date blocks of data in the buffer cache are alwaysoverwritten by their more current counterpart when that block's datareturns from the accessing processor. Thus, in the BSD 4.3 system animage of data in the system's buffer cache always reflects the truestate of the file. When multiple requests contend for the same image,the BSD 4.3 system queues the requests from the various processors andsequences the requests such that each request is completely servicedbefore any processing commences on the next request. Employing thepreceding strategy, the BSD 4.3 system ensures the integrity of data atthe level of individual requests for access to segments of file datastored on a hard disk.

Because the BSD 4.3 system provides access to data from its buffercache, blocks of data on the hard disk frequently do not reflect thetrue state of the data. That is, in the BSD 4.3 system, frequently thetrue state of a file exists in the projected image in the system'sbuffer cache that has been modified since being projected there from thehard disk, and that has not yet been written back to the hard disk. Inthe BSD 4.3 system, images of data that are more current than and differfrom their source on the hard disk data may persist for very longperiods of time, finally being written back to the hard disk just beforethe image is about to be discarded due to its “death” by the LRUprocess. Conversely, other caching systems exist that maintain datastored on the hard disk current with its image projected into a datacache. Network File System (“NFS®”) is one such caching system.

In many ways, NFS's client cache resembles the BSD 4.3 systems buffercache. In NFS, each client processor that is connected to a network mayinclude its own cache for storing blocks of data. Furthermore, similarto BSD 4.3, NFS uses the LRU algorithm for selecting the location in theclient's cache that receives data from an NFS server across the network,such as Ethernet. However, perhaps one of NFS's most significantdifferences is that images of blocks of data are not retrieved intoNFS's client cache from a hard disk attached directly to the processoras in the BSD 4.3 system. Rather, in NFS images of blocks of data cometo NFS's client cache from a file server connected to a network such asEthernet.

The NFS client cache services requests from a computer program executedby the client processor using the same general procedures describedabove for the BSD 4.3 system's buffer cache. If the requested data isalready projected into the NFS client cache, it will be accessed almostinstantaneously. If requested data is not currently projected into NFS'sclient cache, the LRU algorithm must be used to determine the block ofdata to be replaced, and that block of data must be discarded before therequested data can be projected over the network from the file serverinto the recently vacated buffer.

In the NFS system, accessing data that is not present in its clientcache takes approximately 500 times longer than accessing data that ispresent there. About one-half of this delay is due to the processingrequired for transmitting the data over the network from an NFS fileserver to the NFS client cache. The remainder of the delay is the timerequired by the file server to access the data on its hard disk and totransfer the data from the hard disk into the file server's RAM.

In an attempt to reduce this delay, client processors read ahead toincrease the probability that needed data will have already beenprojected over the network from the file server into the NFS clientcache. When NFS detects that a client processor is accessing a filesequentially, blocks of data are asynchronously pre-fetched in anattempt to have them present in the NFS client cache when the clientprocessor requests access to the data. Furthermore, NFS employs anasynchronous write behind mechanism to transmit all modified data imagespresent in the client cache back to the file server without delaying theclient processor's access to data in the NFS client cache until NFSreceives confirmation from the file server that it has successfullyreceived the data. Both the read ahead and the write behind mechanismsdescribed above contribute significantly to NFS's reasonably goodperformance. Also contributing to NFS's good performance is its use of acache for directories of files present on the file server, and a cachefor attributes of files present on the file server.

Several features of NFS reduce the consistency of its projected imagesof data. For example, images of file data present in client caches arere-validated every 3 seconds. If an image of a block of data about to beaccessed by a client is more than 3 seconds old, NFS contacts the fileserver to determine if the file has been modified since the file serveroriginally projected the image of this block of data. If the file hasbeen modified since the image was originally projected, the image ofthis block in the NFS client cache and all other projected images ofblocks of data from the same file are removed from the client cache.When this occurs, the buffers in RAM thus freed are queued at thebeginning of a list of buffers (the LRU list) that are available forstoring the next data projected from the file server. The images ofblocks of data discarded after a file modification are re-projected intoNFS's client cache only if the client processor subsequently accessesthem.

If a client processor modifies a block of image data present in the NFSclient cache, to update the file on the file server NFS asynchronouslytransmits the modified data image back to the server. Only when anotherclient processor subsequently attempts to access a block of data in thatfile will its cache detect that the file has been modified.

Thus, NFS provides client processors with data images of poorconsistency at reasonably good performance. However, NFS works only forthose network applications in which client processors don't share dataor, if they do share data, they do so under the control of a filelocking mechanism that is external to NFS. There are many classes ofcomputer application programs that execute quite well if they accessfiles directly using the Unix File System that cannot use NFS because ofthe degraded images projected by NFS.

Another limitation imposed by NFS is the relatively small size (8 kbytes) of data that can be transferred in a single request. Because ofthis small transfer size, processes executing on a client processor mustcontinually request additional data as they process a file. The clientcache, which typically occupies only a few megabytes of RAM in eachclient processor, at best, reduces this workload to some degree.However, the NFS client cache cannot mask NFS's fundamental characterthat employs constant, frequent communication between a file server andall of the client processors connected to the network. This need forfrequent server/client communication severely limits the scalability ofan NFS network, i.e., severely limits the number of processors that maybe networked together in a single system.

Andrew File System (“AFS”) is a data caching system that has beenspecifically designed to provide very good scalability. Now used at manyuniversities, AFS has demonstrated that a few file servers can supportthousands of client workstations distributed over a very largegeographic area. The major characteristics of AFS that permit itsscalability are:

-   -   The unit of cached data increases from NFS's 8 k disk block to        an entire file. AFS projects complete files from the file server        into the client workstations,    -   Local hard disk drives, required on all AFS client workstations,        hold projected file images. Since AFS projects images of        complete files, its RAM is quickly occupied by image        projections. Therefore, AFS projects complete files onto a        client's local hard disk, where they can be locally accessed        many times without requiring any more accesses to the network or        to the file server,    -   In addition to projecting file images onto a workstation's hard        disk, similar to BSD 4.3, AFS also employs a buffer cache        located in the workstation's RAM to store images of blocks of        data projected from the file image stored on the workstation's        hard disk.

Under AFS, when a program executing on the workstation opens a file, anew file image is projected into the workstation from the file serveronly if the file is not already present on the workstation's hard disk,or if the file on the file server supersedes the image stored on theworkstation's hard disk. Thus, assuming that an image of a file haspreviously been projected from a network's file server into aworkstation, a computer program's request to open that file requires, ata minimum, that the workstation transmit at least one message back tothe server to confirm that the image currently stored on its hard diskis the most recent version. This re-validation of a projected imagerequires a minimum of 25 milliseconds for files that haven't beensuperseded. If the image of a file stored on the workstation's hard diskhas been superseded, then it must be re-projected from the file serverinto the workstation, a process that may require several seconds. Afterthe file image has been re-validated or re-projected, programs executedby the workstation access it via AFS's local file system and its buffercache with response comparable to those described above for BSD 4.3.

The consistency of file images projected by AFS start out as being“excellent” for a brief moment, and then steadily degrades over time.File images are always current immediately after the image has beenprojected from the file server into the client processor, orre-validated by the file server. However, several clients may receivethe same file projection at roughly the same time, and then each clientmay independently begin modifying the file. Each client remainscompletely unaware of any modifications being made to the file by otherclients. As the computer program executed by each client processorcloses the file, if the file has been modified the image stored on theprocessor's hard disk is transmitted back to the server. Each successivetransmission from a client back to the file server overwrites theimmediately preceding transmission. The version of the file transmittedfrom the final client processor to the file server is the version thatthe server will subsequently transmit to client workstations when theyattempt to open the file. Thus at the conclusion of such a process thefile stored on the file server incorporates only those modificationsmade by the final workstation to transmit the file, and allmodifications made at the other workstations have been lost. While theAFS file server can detect when one workstation's modifications to afile overwrites modifications made to the file by another workstation,there is little the server can do at this point to prevent this loss ofdata integrity.

AFS, like NFS, fails to project images with absolute consistency. Ifcomputer programs don't employ a file locking mechanism external to AFS,the system can support only applications that don't write to sharedfiles. This characteristic of AFS precludes using it for any applicationthat demands high integrity for data written to shared files.

DISCLOSURE OF THE INVENTION

An object of the present invention is to provide a digital computersystem capable of projecting larger data images, over greater distances,at higher bandwidths, and with much better consistency than the existingdata caching mechanisms.

Another object of the present invention is to provide a generalized datacaching mechanism capable of projecting multiple images of a datastructure from its source into sites that are widely distributed acrossa network.

Another object of the invention is to provide a generalized data cachingmechanism in which an image of data always reflects the current state ofthe source data structure, even when it is being modified concurrentlyat several remote sites.

Another object of the present invention is to provide a generalized datacaching mechanism in which a client process may operate directly upon aprojected image as though the image were actually the source datastructure. Another object of the present, invention is to provide ageneralized data caching mechanism that extends the domain over whichdata can be transparently shared.

Another object of the present invention is to provide a generalized datacaching mechanism that reduces delays in responding to requests foraccess to data by projecting images of data that may be directlyprocessed by a client site into sites that are “closer” to therequesting client site.

Another object of the present invention is to provide a generalized datacaching mechanism that transports data from its source into theprojection site(s) efficiently.

Another object of the present invention is to provide a generalized datacaching mechanism that anticipates future requests from clients and,when appropriate, projects data toward the client in anticipation of theclient's request to access data.

Another object of the present invention is to provide a generalized datacaching mechanism that maintains the projected image over an extendedperiod of time so that requests by a client can be repeatedly servicedfrom the initial projection of data.

Another object of the present invention is to provide a generalized datacaching mechanism that employs an efficient consistency mechanism toguarantee absolute consistency between a source of data and allprojected images of the data.

Briefly the present invention is a cache apparatus for use in a network.The cache apparatus receives network file-services-protocol requestsfrom client workstations coupled to the network and responds to therequests. The cache apparatus includes a digital memory for storing datatransmitted in responding to the network file-services-protocolrequests. A processing unit, included n the cache apparatus, executesprogram instructions. A network interface of the cache apparatus couplesthe cache apparatus to the network. The interface includes programinstructions, executed by the processing unit, for receiving the networkfile-services-protocol requests and transmitting responses to therequests. A file-request-service module of the cache apparatus includesprogram instructions, executed by the processing unit, for interpretingthe network file-services-protocol requests and generating responses tothe requests. The file-request-service module also checks the memory todetermine if an image of data specified by the request is present, andwhen the data is present, retrieves the data to be included in theresponse. A file-request-generation module of the cache apparatusincludes program instructions, executed by the processing unit, forstoring data received from the network and for generating networkfile-services-protocol requests for data specific in the requestsreceived by the file-request-service module and determined to be missingfrom the memory by the file-request-service module. The networkinterface transmits file-request-generation module requests to network.

These and other features, objects and advantages will be understood orapparent to those of ordinary skill in the art from the followingdetailed description of the preferred embodiment as illustrated in thevarious drawing figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram depicting a networked, multi-processor digitalcomputer system that includes an NDC server terminator site, an NDCclient terminator site, and a plurality of intermediate NDC sites, eachNDC site in the networked computer system operating to permit the NDCclient terminator site to access data stored at the NDC serverterminator site;

FIG. 2 is a block diagram that provides another way of illustrating thenetworked, multi-processor digital computer system of FIG. 1;

FIG. 3 is a block diagram depicting a structure of the NDC included ineach NDC site of FIG. 1 including the NDCs buffers;

FIG. 4, made up of FIGS. 4A and 4B, is a computer program listingwritten in the C programming language setting forth a data structure ofa channel and of a subchannel included in the channel that are used bythe NDC of FIG. 3;

FIG. 5 is a table written in the C programming language that specifiesthe values of various flags used by the channel illustrated in FIG. 4;

FIG. 6 is a table written in the C programming language that defines thevalues of various flags used in specifying the state of channels;

FIG. 7 is a block diagram illustrating projected images of a singledataset being transferred through the NDC site depicted in FIG. 3 andillustrating the storage of various segments of the dataset in the NDCbuffers;

FIG. 8 is a block diagram depicting a channel and a plurality ofsubchannels operating to access various segments of a dataset that havebeen projected into the NDC buffers illustrated in FIGS. 3 and 7;

FIG. 9 is a table written in the C programming language defining themessage type codes for the various different Data Transfer Protocol(“DTP”) messages that can be transmitted between NDC sites;

FIG. 10, made up of FIGS. 10A and 1013, is a definition written in the Cprogramming language of the data structure for DTP messages;

FIG. 11, made up of FIGS. 11A, 11B, 11C, 11D, 11E, 11F, 11G, 11H, and11I, are definitions written in the C programming language for variousdata substructures incorporated into the structures of FIGS. 4 and 10;

FIG. 12 is a definition written in the C programming language of thedata structure that is used in chaining together DTP messages;

FIG. 13, made up of FIGS. 13A and 13B, is a definition written in the Cprogramming language for a data structure that contains the channel'smetadata;

FIG. 14 is a definition written in the C programming language settingforth the structure of an upstream site structure that is used by theNDC of FIGS. 3 and 7 for storing information about the activity ofupstream NDC sites in accessing a dataset stored at the NDC serverterminator site;

FIG. 15 is a block diagram illustrating a tree of NDC sites including anNDC server terminator site having a stored file that may be accessedfrom a plurality of NDC client terminator sites; and

FIG. 16 is a block diagram illustrating the application of the NDCwithin a file server employing a network of digital computers. Best Modefor Carrying Out the Invention

BEST MODE FOR CARRYING OUT THE INVENTION

FIG. 1 is a block diagram depicting a networked, multiprocessor digitalcomputer system referred to by the general reference character 20. Thedigital computer system 20 includes a Network Distributed Cache (“NDC™”)server site 22, an NDC client site 24, and a plurality of intermediateNDC sites 26A and 26B. Each of the NDC sites 22, 24, 26A and 26B in thedigital computer system 20 includes a processor and RAM, neither ofwhich are illustrated in FIG. 1. Furthermore, the NDC server site 22includes a hard disk 32 for storing data that may be accessed by theclient site 24. The NDC client site 24 and the intermediate NDC site 26Bboth include their own respective hard disks 34 and 36. A clientworkstation 42 communicates with the NDC client site 24 via an EthernetLocal Area Network (“LAN”) 44 in accordance with a network protocol suchas that of the NFS systems identified above.

Each of the NDC sites 22, 24, 26A and 26B in the networked computersystem 20 includes an NDC 50, an enlarged version of which is depictedfor intermediate site 26A. The NDCs 50 in each of the NDC sites 22, 24,26A and 26B include a set of computer programs and a data cache locatedin the RAM of the NDC sites 22, 24, 26A and 26B. The NDCs 50 togetherwith Data Transfer Protocol (“DTP™”) messages 52, illustrated in FIG. 1by the lines joining pairs of NDCs 50, provide a data communicationnetwork by which the client workstation 42 may access data on the harddisk 32 via the NDC sites 24, 26B, 26A and 22.

The NDCs 50 operate on a data structure called a “dataset.” Datasets arenamed sequences of bytes of data that are addressed by:

-   -   a server-id that identifies the NDC server site where source        data is located, such as NDC server site 22; and    -   a dataset-id that identifies a particular item of source data        stored at that site, usually on a hard disk, such as the hard        disk 32 of the NDC server site 22.        The dataset-id may specify a file on the hard disk 32 of the NDC        server site 22, in which case it would likely be a compound        identifier (filesystem id, file id), or it may specify any other        contiguous byte sequence that the NDC server site 22 is capable        of interpreting and is willing to transmit to the NDC client        site 24. For example, a dataset could be ten pages from the RAM        of the NDC server site 22. Such a ten page segment from the RAM        of the NDC server site 22 might itself be specified with a        filesystem-id that identifies virtual memory and a file-id that        denoted the starting page number within the virtual memory.

The NDC client site 24 requests access to data from the NDC server site22 using an NDC_LOAD message specifying whether the type of activitybeing performed on the dataset at the NDC client site 24 is a read or awrite operation. The range of data requested with an NDC_LOAD messagespecifies the byte sequences within the named dataset that are beingaccessed by the NDC client site 24. A single request by the NDC clientsite 24 may specify several disparate byte sequences, with norestriction on the size of each sequence other than it be discontiguousfrom all other sequences specified in the same request. Thus, eachrequest to access data by the NDC client site 24 contains a series ofrange specifications, each one of which is a list of offset/length pairsthat identify individual contiguous byte sequences within the nameddataset.

Topology of an NDC Network

An NDC network, such as that illustrated in FIG. 1 having NDC sites 22,24, 26A and 26B, includes:

-   -   1. all nodes in a network of processors that are configured to        participate as NDC sites; and    -   2. the DTP messages 52 that bind together NDC sites, such as NDC        sites 22, 24, 26A and 26B.        Any node in a network of processors that possesses a megabyte or        more of surplus RAM may be configured as an NDC site. NDC sites        communicate with each other via the DTP messages 52 in a manner        that is completely compatible with non-NDC sites.

FIG. 1 depicts a series of NDC sites 22, 24, 26A and 26B linked togetherby the DTP messages 52 that form a chain connecting the clientworkstation 42 to the NDC server site 22. The NDC chain may beanalogized to an electrical transmission line. The transmission line ofthe NDC chain is terminated at both ends, i.e., by the NDC server site22 and by the NDC client site 24. Thus, the NDC server site 22 may bereferred to as an NDC server terminator site for the NDC chain, and theNDC client site 24 may be referred to as an NDC client terminator sitefor the NDC chain. An NDC server terminator site 22 will always be thenode in the network of processors that “owns” the source data structure.

The other end of the NDC chain, the NDC client terminator site 24, isthe NDC site that receives requests from the client workstation 42 toaccess data on the NDC server site 22.

Data being written to the hard disk 32 at the NDC server site 22 by theclient workstation 42 flows in a “downstream” direction indicated by adownstream arrow 54. Data being loaded by the client workstation 42 fromthe hard disk 32 at the NDC server site 22 is pumped “upstream” throughthe NDC chain in the direction indicated by an upstream arrow 56 untilit reaches the NDC client site 24. When data reaches the NDC client site24, it together with metadata is reformatted into a reply message inaccordance with the appropriate network protocol such as NFS, and sentback to the client workstation 42. NDC sites are frequently referred toas being either upstream or downstream of another NDC site. Thedownstream NDC site 22, 26A or 26B must be aware of the types ofactivities being performed at its upstream NDC sites 26A, 26B or 24 atall times.

In the network depicted in FIG. 1, a single request by the clientworkstation 42 to read data stored on the hard disk 32 is serviced inthe following manner:

-   -   1. The request flows across the Ethernet LAN 44 to the NDC        client site 24 which serves as a gateway to the NDC chain.        Within the NDC client site 24, an NDC client intercept routine        102, illustrated in FIGS. 3 and 7, inspects the request. If the        request is an NFS request and if the request is directed at any        NDC site 24, 26A, 26B, or 22 for which the NDC client site 24 is        a gateway, then the request is intercepted by the NDC client        intercept routine 102.    -   2. The NDC client intercept routine 102 converts the NFS request        into a DTP request, and then submits the request to an NDC core        106.    -   3. The NDC core 106 in the NDC client site 24 receives the        request and checks its NDC cache to determine if the requested        data is already present there. If all data is present in the NDC        cache of the NDC client site 24, the NDC 50 will copy pointers        to the data into a reply message structure and immediately        respond to the calling NDC client intercept routine 102.    -   4. If all the requested data isn't present in the NDC cache of        the NDC client site 24, then the NDC 50 will access any missing        data elsewhere. If the NDC site 24 were a server terminator        site, then the NDC 50 would access the filesystem for the hard        disk 34 upon which the data would reside.    -   5. Since the NDC client site 24 is a client terminator site        rather than a server terminator site, the NDC 50 must request        the data it needs from the next downstream NDC site, i.e.,        intermediate NDC site 26B in the example depicted in FIG. 1.        Under this circumstance, DTP client interface routines 108,        illustrated in FIGS. 3 and 7, are invoked to request from the        intermediate NDC site 26B whatever additional data the NDC        client site 24 needs to respond to the current request.    -   6. A DTP server interface routine 104, illustrated in FIGS. 3        and 7, at the downstream intermediate NDC site 26B receives the        request from the NDC 50 of the NDC client site 24 and processes        it according to steps 3, 4, and 5 above. The preceding sequence        repeats for each of the NDC sites 24, 26B, 26A and 22 in the NDC        chain until the request reaches the server terminator, i.e., NDC        server site 22 in the example depicted in FIG. 1, or until the        request reaches an NDC site that has all the data that is being        requested of it.    -   7. When the NDC server terminator site 22 receives the request,        its NDC 50 accesses the source data structure. If the source        data structure resides on a hard disk, the appropriate file        system code (UFS, DOS, etc.) is invoked to retrieve the data        from the hard disk 32.    -   8. When the file system code on the NDC server site 22 returns        the data from the hard disk 32, a response chain begins whereby        each downstream site successively responds upstream to its        client, e.g. NDC server site 22 responds to the request from        intermediate NDC site 26A, intermediate NDC site 26A responds to        the request from intermediate NDC site 26B, etc.    -   9. Eventually, the response percolates up through the sites 22,        26A, and 26B to the NDC client terminator site 24.    -   10. The NDC 50 on the NDC client site 24 returns to the calling        NDC client intercept routine 102, which then packages the        returned data and metadata into an appropriate network protocol        format, such as that for an NFS reply, and sends the data and        metadata back to the client workstation 42.

The NDC client intercept routines 102 are responsible for performing allconversions required between any supported native protocol, e.g. NFS,Server Message Block (“SMB”), Novelle Netware®, etc., and the DTPmessages 52 employed for communicating among the NDCs 50 making up theNDC chain. The conversion between each native protocol and the DTPmessages 52 must be so thorough that client workstations, such as theclient workstation 42, are unable to distinguish any difference inoperation between an NDC 50 functioning as a server to that workstationand that workstation's “native” server.

An alternative way of visualizing the operation of the NDCs 50′ isillustrated in FIG. 2. Those elements depicted in FIG. 2 that are commonto the digital computer system 20 depicted in FIG. 1 bear the samereference numeral distinguished by a prime (″″•) designation. The NDCs50′ in the sites 22′, 26A′, 26B′ and 24′ provide a very high speed dataconduit 62 connecting the client intercept routines 102 of the NDCclient site 24* to file system interface routines 112 of the NDC serversite 22′, illustrated in FIGS. 3 and 7. Client workstations, using theirown native protocols, may plug into the data conduit 62 at each of theNDC sites 22, 26A, 26B and 24 via the NDCs client intercept routines 102in each of the NDC sites 22, 26A, 26B and 24. Accordingly, the NDC 50 ofthe intermediate NDC site 26A may interface into a Novelle Netwarenetwork 64. Similarly, the NDC 50 of the intermediate NDC site 26B mayinterface into a SMB network 66, and into an NFS network 68. If an NDCsite 24, 26B, 26A or 22 is both the client terminator site and theserver terminator site for a request to access data, then the NDC dataconduit 62 is contained entirely within that NDC site 24, 26B, 26A or22.

After an NDC 50′ intercepts a request from a client workstation on oneof the networks 44′, 64, 66 or 68 and converts it into the DTP messages52′, the request travels through the data conduit 62 until all the datahas been located. If a request is a client's first for a particulardataset, the DTP messages 52′ interconnecting each pair of NDCs 50′ formthe data conduit 62 just in advance of a request's passage. If a requestreaches the NDC server terminator site 22′, the NDC 50′ directs it tothe appropriate file system on the NDC server terminator site 22′. EachNDC site 22′ may support several different types of file systems forhard disks attached thereto such as the hard disks 32′, 34′, and 36′.

After the file system at the NDC server terminator site 22′ returns therequested data to its NDC 50′, the NDC 50′ passes the reply data andmetadata back up through each NDC site 26A′ and 26B′ until it reachesthe client terminator 24′. At the client terminator 24′, the NDC routineoriginally called by the NDC client intercept routine 102 returns backto that routine. The NDC client intercept routine 102 then reformats thedata and metadata into an appropriately formatted reply message anddispatches that message back to the client workstation 42′.

Four components of the NDC 50′ support the data conduit 62:

-   -   The resource management mechanisms of the NDC client terminator        site that measure the rate at which its client workstations        consume data and also notes whether the data is being accessed        sequentially. Each NDC 50′ also measures the rate of        replenishment from downstream NDC sites.    -   The pre-fetch mechanism that enables each of the NDC sites 22′,        24′, 26A′ and 26B′ to operate autonomously, thereby reducing        network traffic substantially and enabling each NDC site to        directly respond to requests from client workstations or        upstream NDC sites.    -   The DTP message 52′ which allows multiple data segments of any        length to be transferred with a single request. ∘ The        consistency control mechanism that very efficiently monitors and        maintains the integrity of all projections of data from the NDC        server terminator site 22′ to the NDC client terminator site        24′.

NDC 50

As depicted in FIGS. 3 and 7, the NDC 50 includes five major components:

-   -   client intercept routines 102;    -   DTP server interface routines 104;    -   NDC core 106;    -   DTP client interface routines 108; and    -   file system interface routines 112.

Routines included in the NDC core 106 implement the function of the NDC50. The other routines 102, 104, 108 and 112 supply data to and/orreceive data from the NDC core 106. The main building block of the NDCcore 106 is a data structure called a channel 116 illustrated in FIG. 4.The NDC core 106 typically includes anywhere from 2,000 to 100,000channels 116, depending on the size of the NDC site 22, 24, 26A or 26B.The RAM in each NDC site 22, 24, 26A or 26B that is occupied by thechannels 116 is allocated to the NDC 50 upon initialization of the NDCsite. Each channel 116 is a conduit for projecting images of a datasetfurther upstream, or, if the channel 116 for the dataset is located inthe client terminator site 24, it also provides the space into which thedata images are projected. The routines of the NDC core 106, describedin greater detail below, are responsible for maintaining data imageswithin the NDC site 22, 24, 26A or 26B or expediting their passagethrough the NDC site 22, 24, 26A or 26B.

FIG. 5 is a table written in the C programming language that specifiesthe values of various flags used in controlling the operation of the NDCsites 22, 26A, 26B and 24. FIG. 6 is a table written in the Cprogramming language that lists the values of various flags used inspecifying the state of channels 116. Depending upon the operation ofthe NDC 50, the values of various ones of the flags listed in FIGS. 5and 6 will be assigned to the channels 116 or other data structuresincluded in the NDC 50.

FIGS. 3 and 7 illustrate the client intercept routines 102, that areneeded only at NDC sites which may receive requests for data in aprotocol other than DTP, e.g., a request in NFS protocol, SMB protocol,or another protocol, are completely responsible for all conversionsnecessary to interface a projected dataset image to a request that hasbeen submitted via any of the industry standard protocols supported atthe NDC site 22, 24, 26A or 26B.

NDC sites 22, 24, 26A and 26B are always equipped with both the DTPserver interface routines 104 and the DTP client interface routines 108.NDC sites 22, 24, 26A and 26B communicate via the DTP messages 52 whichmove raw data, independent not only of any protocol such as NFS, SMB, orNetware, but also of any structure other than byte sequences within anidentified dataset. The DTP messages 52 enable a single request tospecify multiple segments of a named set of data as the targets of asingle operation. Each segment specified in a DTP request is a sequenceof consecutive bytes of data of any length.

The file system interface routines 112 are included in the NDC 50 onlyat NDC file server sites, such as the NDC server site 22. The filesystem interface routines 112 route data between the disk drives 32A,32B and 32C illustrated in FIG. 3 and the NDC data conduit 62 thatextends from the NDC server terminator site 22 to the NDC clientterminator site 24.

Another illustration of the NDC 50, depicted in FIG. 7, portrays an NDCdata conduit 62 passing through an NDC site, such as the NDC sites 22,24, 26A or 26B. The NDC data conduit 62, stretching from the NDC serverterminator site 22 to the NDC client terminator site 24, is composed ofthe channels 116 at each NDC site 22, 24, 26A or 26B that have boundtogether to form an expressway for transporting data between the NDCserver terminator site 22 and the NDC client terminator site 24. Eachchannel 116 in the chain of NDC sites 22, 24, 26A and 26B is capable ofcapturing and maintaining images of data that pass through it, unless aconcurrent write sharing (“CWS”) condition exists for that data.However, whether a channel 116 opts to capture an image of data passingthrough the NDC site 22, 24, 26A or 26B depends heavily upon thelocation of the channel 116 in the NDC data conduit 62. There are threepossible locations for a channel 116 in the NDC data conduit 62.

First, a channel 116 may be located at the NDC client terminator site 24in which case images of data are projected and sustained within the NDCsite by the routines in the NDC core 106 with substantial assistancefrom the DTP client interface routines 108. The NDC 50 at the NDC clientterminator site 24 services requests from clients, such as the clientworkstation 42, directly from projected images via the client interceptroutines 102. Most image projections are sustained only in clientterminator sites, such as the NDC client terminator site 24.

Second, a channel 116 may be located at an intermediate NDC site, suchas the intermediate NDC sites 26A or 26B, in which case images areusually projected within the NDC site only for the minimum time requiredfor the data to traverse the NDC site. However, if a CWS conditionexists for a channel 116, the channel 116 at an intermediate NDC site26A or 26B that controls the consistency of the data will capture andsustain images that otherwise would have been projected further upstreamto the NDC client terminator site 24. The NDC 50 at an intermediate NDCsite 26A or 26B employs the DTP server interface routines 104, theroutines of the NDC core 106, and the DTP client interface routines 108to provide these functions.

Third, a channel 116 may be located at a server terminator, such as theNDC server terminator site 22, in which case images are usuallyprojected within the NDC site only for the minimum time required for thedata to traverse the site. The NDC 50 at an NDC server terminator site22 employs the DTP server interface routines 104, the routines in theNDC core 106, and the file system interface routines 112. NDC serverterminator sites operate in most respects similar to an intermediate NDCsite. However, if the NDC server terminator site 22 lacks requesteddata, it invokes one of the file system interface routines 112 insteadof a DTP client interface routines 108 to obtain the needed data.

If the client intercept routines 102 of the NDC 50 receives a request toaccess data from a client, such as the client workstation 42, itprepares a DTP request indicated by the arrow 122 in FIG. 3. If the DTPserver interface routines 104 of the NDC 50 receives a request from anupstream NDC 50, it prepares a DTP request indicated by the arrow 124 inFIG. 3. DTP requests 122 and 124 are presented to the NDC core 106.Within the NDC core 106, the DTP request 122 or 124 cause a buffersearch routine 126 to search a pool 128 of NDC buffers 129, as indicatedby the arrow 130 in FIG. 3, to determine if all the data requested byeither the routines 102 or 104 is present in the NDC buffers 129 of thisNDC 50. (The channel 116 together with the NDC buffers 129 assigned tothe channel 116 may be referred to collectively as the NDC cache.) Ifall the requested data is present in the NDC buffers 129, the buffersearch routine 126 prepares a DTP response, indicated by the arrow 132in FIG. 3, that responds to the request 122 or 124, and the NDC core 106appropriately returns the DTP response 132, containing both data andmetadata, either to the client intercept routines 102 or to the DTPserver interface routines 104 depending upon which routine 102 or 104submitted the request 122 or 124. If the client intercept routine 102receives DTP response 132, before the client intercept routine 102returns the requested data and metadata to the client workstation 42 itreformats the response from DTP to the protocol in which the clientworkstation 42 requested access to the dataset, e.g. into NFS, SMB,Netware or any other protocol.

If all the requested data is not present in the NDC buffers 129, thenthe buffer search routine 126 prepares a DTP downstream request,indicated by the arrow 142 in FIG. 3, for only that data which is notpresent in the NDC buffers 129. A request director routine 144 thendirects the DTP request 142 to the DTP client interface routines 108, ifthis NDC 50 is not located in the NDC server terminator site 22, or tothe file system interface routines 112, if this NDC 50 is located in theNDC server terminator site 22. After the DTP client interface routines108 obtains the requested data together with its metadata from adownstream NDC site 22, 26A, etc. or the file system interface routines112 obtains the data from the file system of this NDC client terminatorsite 24, the data is stored into the NDC buffers 129 and the buffersearch routine 126 returns the data and metadata either to the clientintercept routines 102 or to the DTP server interface routines 104 asdescribed above.

Channels 116

The NDC 50 employs channels 116 to provide a data pathway through eachNDC site 22, 24, 26A and 26B, and to provide a structure for storing ahistory of patterns of accessing each dataset for each client, such asthe client workstation 42, as well as performance measurements on bothclients and the NDC server terminator site 22. Using this information,the NDC 50 is able to anticipate future demand by the client, such asthe client workstation 42, and the latencies that will be incurred onany request that must be directed downstream toward the NDC serverterminator site 22.

Channels 116 are the main data structure making up the NDC 50. Eachchannel 116 enables an image of data to be projected into the site. Forsmall datasets (144 k or less), the image will often reflect the entiredataset. For larger datasets, the image may consist of one or morepartial images of the dataset. A dataset may be projected concurrentlyinto several NDC sites 22, 24, 26A and 26B. In all NDC sites 22, 24, 26Aand 26B, at all times, the projected image will exactly match thecurrent state of the dataset. A channel 116 belonging to the NDC 50 ateither of the intermediate NDC sites 26A or 26B may be referred to as an“intermediate channel.”

A channel 116 may exist within an NDC 50 without containing anyprojections of the data with which it is associated. This would be thenormal state of a channel 116 that's participating in the CWS of data.

A CWS condition exists if multiple clients, such at the clientworkstation 42, are simultaneously accessing the same dataset, and atleast one of them is writing the dataset. In this mode of operation,referred to as concurrent mode, images are projected into an NDC site22, 24, 26A or 26B for only a very brief period between the receipt ofthe reply from a downstream NDC site, e.g., the receipt by intermediateNDC site 26B of a reply from intermediate NDC site 26A, and theforwarding of the reply upstream, e.g. the forwarding of a reply fromintermediate NDC site 26B to NDC client terminator site 24, or theforwarding of the reply into the client intercept routines 102, if thesite is the NDC client terminator site 24.

Channels 116 that don't maintain a projected image of data when a CWScondition exists still serve an important function in the overalloperation of the digital computer system 20. In addition to data, eachchannel 116 stores other information that:

-   -   measures the rate at which the client, e.g. the client        workstation 42, consumes data;    -   monitors the client's access pattern, i.e. random or sequential;    -   measures the response latencies for downstream services such as        requesting access to data from the NDC server terminator site        22; and    -   monitors the activities of upstream sites to detect the presence        of a CWS condition.

Thus, each channel 116 is much more than just a cache for storing animage of the dataset to which it's connected. The channel 116 containsall of the information necessary to maintain the consistency of theprojected images, and to maintain high performance through the efficientallocation of resources. The channel 116 is the basic structure throughwhich both control and data information traverse each NDC site 22, 24,26A and 26B, and is therefore essential for processing any request. Thefollowing sections describe more completely the structure and use ofchannels 116.

Structure of Channel 116

FIG. 4 discloses the presently preferred structure for the channel 116in the “C” programming language. The salient features of FIG. 4 are:

-   -   each channel 116 can be linked into a hash list;    -   each channel 116 can be linked into a channel free list;    -   each channel 116 contains a considerable amount of state        information, including:        -   the dataset handle (identifies: server, filesystem, file)            for data with which the channel 116 is associated;        -   a cached copy of the dataset's attributes;        -   if the dataset is a directory, a pointer to a cached image            of the directory, already formatted for transmission            upstream;        -   an indicator specifying how far write data must be flushed            downstream before responding back to the client;        -   pointers to the current request message that's being            processed and any currently outstanding upstream or            downstream messages that have been issued by the NDC site            22, 24, 26A or 26B in the process of executing the request;        -   a pointer to a list of NDC_UPSTREAM_SITE structures that            keep track of all upstream activity;        -   the address of the next level downstream site; and        -   measurements on the channel data rate, dataset data rate,            and a count of the number of requests that exactly spliced            onto the end of a previous request; and    -   each channel 116 contains a single instance of a structure for a        subchannel 152, illustrated in FIG. 4B, which contains pointers        to any NDC buffers 129, illustrated in FIG. 3, into which any        portion of the dataset is currently being projected.

Each channel 116, including its built-in subchannel 152, occupies about500 bytes of RAM. The RAM occupied by any NDC buffer 129, illustrated inFIG. 3, that hold data image projections is in addition to the amount ofRAM occupied by each channel 116. However, pointers to the NDC buffers129 are included in the RAM occupied by each channel 116. Also, all NDCmetadata, i.e., information about the named set of data such as fileattributes (attr), server name (serverpid), filesystem id (NDC_FH.fsid),and file id (NDC_FH.fid) illustrated in FIG. 4, is projected directlyinto the channel structure (NDC_STATS and NDC_ATTR).

The channel 116 may contain complete or partial images of a file or of adirectory. The channel 116 is capable of projecting an image of acomplete file from the NDC server terminator site 22 into the NDC clientterminator site 24, even if the file is very large. However, issues ofshared resource management will usually preclude projecting large dataimages from the NDC server terminator site 22 into the NDC clientterminator site 24.

Any image of data that is projected from the NDC server terminator site22 into the NDC client terminator site 24 is always valid and may bedirectly operated upon by the client workstation 42 either for readingor for writing without requesting further service from downstream NDCsites 26B, 26A or 22. If the client workstation 42 modifies the data, nomatter how remote the client workstation 42 may be located from the NDCserver terminator site 22, any projected image segments of the data thathas just been modified at any other NDC site will be removed beforeprocessing the next request for that data at that NDC site.

Subchannels 152

A channel 116 may include one or more channel structures. A channel 116that includes only a single channel structure, such as that illustratedin FIG. 4, is referred to as a simple channel 116. A simple channel 116can project a single image of limited size. However, as illustrated inFIG. 8, through the use of a subchannel 152, a simple channel 116 may beextended thus permitting it to project from a file 156 a segment 158 ofcontiguous data that is larger than that which can be projected usingonly a simple channel 116. A channel structure made up of a channel 116and one or more subchannels 152, illustrated in FIG. 8, may be referredto as a complex channel 116. As described previously and illustrated inFIG. 8, the NDC 50 always projects images of data from a file 156 insegment 158. Each segment 158 illustrated in FIG. 8 is a series ofconsecutive bytes from the file 156 specified by offset and seg_lengthvariables stored in the structure of a subchannel 152. Moreover, thechannel 116 may also include additional subchannels 152 that projectdiscontiguous segments 158 from the file 156. An image projection thatis larger than that accommodated by the single subchannel 152 includedin a channel 116 requires that the subchannel 152 be extended therebycreating a complex channel 116. Multiple subchannels 152 are linked viaan extent pointer (*ext) 162 of the subchannel 152 to form a logicalsubchannel that can project an image of any size.

Multiple Image Projections

Each channel 116 may also support several different, non-overlappingimage projections simultaneously. Each projection requires one logicalsubchannel. A next subchannel pointer (*next) 164 of each subchannel 152links together the logical subchannels.

The ability to project multiple images of the same dataset facilitatessimultaneously servicing several clients, such as the client workstation42. Small datasets are usually completely projected by a single channel116, and this single projection is shareable. If several clients, suchas the client workstation 42, access a large dataset sequentially butare each operating in different areas of the dataset, then projectionsare generated as required to provide local images of the segments 158being accessed by the different client workstations such as the clientworkstation 42. Furthermore, the NDC 50 may project several images, eachimage being of a discontiguous segment 158 from a single file, for asingle client if that client is performing a significant amount ofsequential processing in several different areas of a large file. Undersuch circumstances, each segment 158 from the file 156 would have itsown projection.

If a projected image grows or shifts to such an extent that it wouldabut or overlap another image, the NDC 50 coalesces both images into asingle segment 158. Thus, segments 158 are always separated from eachother by at least one byte of non-projected data.

Another characteristic of a channel 116 having multiple projections isthat all of its subchannels 152 are ordered in increasing offset intothe dataset.

The channel 116, the subchannel 152, and subchannel 152 extending asubchannel 152 all use the same structure that is disclosed in FIG. 4.When the structure disclosed in FIG. 4 is used as a subchannel 152 or toextend a subchannel 152, some fields remain unused. Although this wastessome space in RAM, it enables complex channels 116 to grow on demandwithout requiring three different resources and the mechanisms toallocate and control them.

Channel Free List

Channels 116 that are not in active use, even though they probably arestill valid and have connections to datasets complete with projectionsof both data and NDC metadata, are placed on the channel free list. Allchannels 116 that are not being used for servicing a request are placedon the channel free list. Conversely, any channel 116 that is currentlyengaged in responding to a request will not be on the channel free list.

The channel free list is formed by linking all free channels togethervia their av_forw and av_back pointers. The channels 116 on the channelfree list are ordered according to the length of time since their lastusage. Whenever a channel 116 is used, it is removed from the channelfree list and marked C_BUSY. After the process that claimed the channel116 has completely finished its task, C_BUSY is cleared and the channel116 is linked onto the end of the channel free list. Repeated use ofthis simple process results in the “time since last use” ordering of thechannel free list.

When the NDC 50 receives a new request specifying a dataset for whichthere is currently no channel 116 connection, a new channel 116 isallocated and assigned to serve as the pathway to the dataset. When anew channel 116 is required, the least recently used channel 116 isremoved from the head of the channel free list, marked as C_BUSY andinvalid, all state associated with the prior request is discarded, andthe channel 116 is re-allocated to the requesting process.

There are two caveats to the preceding procedure:

-   -   A channel 116 that has just been removed from the head of the        channel free list may contain modified data or NDC metadata that        must be flushed downstream to the NDC server terminator site 22.        The presence of a C_DELAYED_WRITE flag in the channel 116        indicates the existence of this condition.    -   A channel 116 may be a complex channel 116 which must be broken        up since, initially, all channels 116 begin as simple channels        116 and may grow to become complex channels 116.

The NDC 50 includes routines called channel daemons that perform generalmaintenance functions on the channel 116 that are needed to keep eachNDC site 22, 24, 26A and 26B at a peak level of readiness. The channeldaemons perform their function in background mode when the NDC 50 is notbusy responding to requests to access data. The NDC 50 invokes theappropriate channel daemon whenever there are no requests to beserviced. During periods of peak load, when requests to access data arepending, the NDC 50 suspends operation of the channel daemons, and thetasks normally performed by the channel daemons are, instead, performeddirectly by the request processing routines themselves.

Channel Daemons:

-   -   maintain the channel free list,    -   schedule the loading and unloading of channels 116, and    -   load and unload channels 116.

There are specialized channel daemons that perform each of thesefunctions. A Reaper daemon routine maintains the channel free list, aLoadmaster daemon routine prioritizes the demands of competing channels116, and Supervisor daemon routines service channels 116 that theyreceive from the Loadmaster daemon routine to ensure that the channels116 are prepared to immediately respond to the next incoming request toaccess data.

The process of claiming a channel 116 from the channel free list occurswhile the NDC 50 is servicing a request. Any time required to handleeither of the two caveats identified above increases the time requiredto respond to the request. When there are several NDC sites 22, 24, 26Aand/or 26B between the client workstation 42 and the NDC serverterminator site 22, the delay at each site may compound until the timeto respond to the request from the client workstation 42 becomesunacceptable. To minimize such delays, it is important to reduce thetime spent in claiming a channel 116 from the channel free list.

To reduce the time required to claim a channel 116 from the channel freelist, the NDC 50 implements the channel free list as five lists that arelinked together. The five channel free lists are:

-   -   CQ_EMPTY This is a list of channels 116 that have no NDC buffers        129 assigned. Channels 116 on this list may still contain        dataset attributes that are still valid. The channels 116 are        those that have been used least recently, and are, therefore,        the prime candidates for re-assignment if a request to access        data requires a new channel.    -   CQ_CLEAN This is a list of channels 116 that have NDC buffers        129 assigned to them. BD_DIRTY_DATA may not be set on any NDC        buffer 129 assigned to a channel 116 that is on this list.        Channels that are full of useless data, e.g., data from a        request that experienced a fatal disk read error, are marked        with C_ERROR, and such channels 116 are enqueued at the front of        the CQ_CLEAN list. Channels 116 that have percolated all the way        up through a CQ_SERVICE list are enqueued at the back of the        CQ_CLEAN list as soon as their data has been flushed downstream        toward the NDC server terminator site 22. Data within channels        116 that are on the CQ_CLEAN list is still valid, and may be        used if it is requested before the channel 116 percolates its        way up through the CQ_CLEAN list.    -   CQ_READY This is a list of channels 116 that are ready to        respond immediately to the next anticipated request to access        data from the client workstation 42. Channels 116 that are        experiencing requests to access the dataset randomly, or        channels 116 that are experiencing requests to access the        dataset sequentially and are still able to immediately respond        to the anticipated request stream to access data are usually        enqueued at the back of the CQ_READY list when they are returned        to the channel free list after being used either for responding        to a request to access data, or for pre-fetching data.    -   CQ_SERVICE Channels 116 on the CQ_SERVICE list have been used        recently, and are approaching the point where they will be        unable to respond immediately to a request to access data from        the client workstation 42. Channels 116 on the CQ_SERVICE list        that contain an image of data that has been modified by the        client workstation 42 may contain dirty file data or metadata        that needs to be flushed downstream toward the NDC server        terminator site 22. Channels 116 on the CQ_SERVICE list that        contain an image of data that is being read by the client        workstation 42 may need to have additional data loaded into them        from downstream so they can respond immediately to future        requests to access data from the client workstation 42.        Occasionally, a channel 116 on the CQ_SERVICE list may        simultaneously require both flushing of dirty data downstream,        and loading of additional data from downstream.    -   CQ_LOCKED The channels 116 on this list are hardwired. The        channels 116 and all NDC buffers 129 allocated to them are        immune from LRU replacement. All intermediate channels 116 in        the intermediate NDC sites 26A and 26B are always placed on the        CQ-LOCKED list to prevent them from being pulled out from under        the corresponding upstream channel(s) 116. Hardwired channels        116 provide dataset connections which respond in a minimum        amount of time. By immunizing channels 116 on the CQ_LOCKED list        from LRU replacement, the channels 116 can respond swiftly to a        request to access data, particularly for applications such as        real-time imaging in which minimum delay times are critical.

In the following description of the present invention, the channel freelist will often be referred to in the singular, and should be thought ofas a single LRU list. The present invention includes the extracomplexity of five free lists so channels 116 can be emptied ofC_DELAYED_WRITE data, and complex channels 116 broken down into simplechannels 116 by channel daemon routines running in the background.

Channels 116 on either the CQ_READY list or the CQ_SERVICE list maycontain modified data that represents the current state of the file,i.e., the underlying downstream data has been superseded by modifieddata from the client workstation 42. When this condition occurs, the NDCbuffers 129 assigned to the channel 116 that contain the modified dataare flagged as B_DELWRI and the channel 116 is flagged asC_DELAYED_WRITE.

If the NDC 50 needs a channel 116 it first checks the CQ_EMPTY list. Ifthe CQ_EMPTY list has no channels 116, then the NDC 50 checks theCQ_CLEAN list. A channel 116 on this list never has any C_DELAYED_WRITEdata, but it might be a complex channel 116 that needs to be broken downinto simple channels 116. If the NDC 50 finds a complex channel 116 onthe CQ_EMPTY list, reduces the complex channel 116 to a collection ofsimple channels 116. One channel 116 is then claimed to respond to therequest to access data and all remaining simple channels 116 areenqueued at the end of the CQ_EMPTY list.

If the CQ_CLEAN list is also empty, the NDC 50 searches the CQ_READYlist. Because the NDC 50 is in the process of responding to a request toaccess data, the NDC 50 skips down the CQ_READY list and takes the mostconvenient channel 116. However, the channel 116 selected by the NDC 50in this manner must be free of C_DELAYED_WRITE data so that no modifieddata will be lost.

Channel Hash Lists

When the NDC 50 begins processing a new request, the first task is toconnect the request to an existing channel 116, if it exists. Thechannel hash lists enable this connection to be performed very quickly.The first step in the connection function that seeks to find an existingchannel 116 is to add the filesystem id and file id together and thendivide this sum by the number of hash buckets. The remainder produced bythe division operation is used as an index into the array of hashbuckets. Each bucket contains a short list of channels 116 that areconnected to files whose filesystem id and file id have been hashed intothe bucket's index.

Having identified a hash bucket, the next step is to search all thechannels 116 on the list for this bucket for an exact match on fileserver address, filesystem id, and file id. If there is a channel 116currently connected to the desired dataset, it will be on this listregardless of whether the channel 116 is on or off the channel free listat the moment. Any channel 116 currently connected to a dataset canalways be located via this hash mechanism. If a search is performed andthe channel 116 isn't located, then none exists.

The c_forw and c_back fields in the structure of the channel 116disclosed in FIG. 4 are used for linking channels 116 on a hash list.When a channel 116 is removed from the channel free list and re-assignedto access a dataset, c_forw and c_back are set and the channel 116 isimmediately linked onto the appropriate hash chain.

Claiming a Channel 116

Routines called ndc_get_channel( ) and ndc_channel_relse( ) make andbreak connections to channels 116 within an NDC site 22, 24, 26A and26B.

“Claiming” a channel 116 is the process by which the NDC 50, for thepurpose of satisfying a request that it has received to access a newdataset either from a local client via the client intercept routines 102or from another NDC site via the DTP server interface routines 104,acquires one of the channels 116 that was allocated to the NDC 50 uponinitialization of the NDC site. In claiming a channel 116, thendc_get_channel( ) routine removes the channel 116 from the channel freelist, marks the channel 116 C_BUSY, and assigns the channel 116 to arequest. Once a channel 116 has been claimed, it is busy and unavailablefor use by any other request that the NDC 50 might receive before thechannel 116 is released. Thus, a channel 116 is either not busy, and canbe found on the channel free list, or it is busy and committed to arequest that is currently being processed.

When ndc get_channel( ) is called to claim a channel 116, one of severalsituations may arise:

-   -   The channel 116 doesn't already exist, so a channel 116 is        claimed from the channel free list, assigned to servicing the        current request, initialized, linked into the appropriate hash        chain, and its pointer returned to the caller.    -   The channel 116 exists and it's not busy. The channel 116 is        removed from the channel free list, it's marked C_BUSY, and its        pointer is returned to the caller.    -   The channel 116 exists and it's busy recalling or disabling        image projections at all upstream sites. A NULL pointer is        immediately returned to the caller telling him to “back-off” so        a consistency operation may complete before the NDC 50 performs        any processing on the current request. In this situation, the        caller must be a DTP server interface routines 104, since the        channel 116 can only recall/disable the channels 116 at upstream        NDC sites, such as the NDC sites 26A, 26B. or 24.    -   The channel 116 exists, is busy (C_BUSY is set), and it is not        in the process of recalling or disabling the upstream NDC site        that issued the current request. If this condition occurs, the        requesting process enters a wait state while simultaneously        requesting to be reactivated as soon as the channel 116 returns        to the channel free list.

The third situation occurs very rarely. Under certain circumstances, anNDC site, e.g., intermediate NDC site 26A, must send a message to itsupstream sites, e.g. NDC sites 26B and 24, that recalls projected imagesof data that have been modified by a client, e.g. the client workstation42, and that disables all projected images of data that are being read.Such communications are referred to as recall/disable messages. If anNDC site, e.g., intermediate NDC site 26A, receives a request from anenabled upstream site, e.g., intermediate NDC site 26B, that isprojecting an image of data, and the request is directed at a channel116 that is awaiting the response to a recall/disable message that hasbeen sent to upstream sites 26B and 24, a deadlock situation isimminent. The request that's just been received at this NDC site, e.g.,intermediate NDC site 26A, can't be processed until the channel 116becomes available. But, the channel 116 won't ever be freed until allsites, e.g. NDC sites 26B and 24, have responded to the recall/disablemessages. However, the recall/disable message will never be processed atthe upstream site, e.g., NDC sites 26B and 24, that just transmitted thenew request because the channels 116 at those sites are busy waiting forthe response to their outstanding requests.

To avoid such a deadlock condition, whenever an upstream requestattempts to claim a channel 116 and discovers that the channel 116 isbusy, additional investigation is performed. If the channel 116 is busyprocessing another client's downstream request, then the NDC 50 justwaits until the channel 116 becomes free, after which it claims thechannel 116, and returns its pointer to the caller.

However, if the channel 116 is busy processing an upstream request,which is a request from the CCS to all upstream sites to either recallor disable their images of projected data, and if the NDC siteoriginating the current request, i.e, the NDC site that's trying toclaim the channel 116 right now, is one of those upstream sites, thenndc_get_channel( ) routine does not pause and await the release of thechannel 116. Rather, the ndc_get_channel( ) routine immediately returnsa NULL pointer to instruct the caller to release its channel 116.

When a DTP server interface routine 104 calls the ndc_get_channel( )routine and receives a returned value of a NULL pointer back from theroutine, the DTP server interface routines 104 must reject the requestit received from upstream. The response is flagged withNDC_RSP_REQUEST_REJECTED to inform the upstream site that this requesthas been rejected. If there are several NDC sites, such as intermediateNDC site 26B, between the NDC site that initially rejects a request andthe NDC client terminator site 24, the rejection must pass up throughall the sites until the rejection reaches the client intercept routine102 of the NDC 50 that originally received the request. Upon receiving arejection, the client intercept routine 102 of the NDC client terminatorsite 24 then backs-off. In general, backing-off is a procedure in which:

-   -   a process, such as the client intercept routine 102, is notified        that a request has been rejected;    -   the process, such as the client intercept routine 102, then        releases its channel 116; and    -   the recall/disable process claims the channel 116, flushes or        invalidates any projected images of the dataset stored in the        NDC buffers 129, and then releases the channel 116 so the        original process, such as the client intercept routine 102, can        re-claim the channel 116 and finally service the client's        request.        Backing-off is always performed within client intercept routines        102, and every client intercept routine 102 must be capable of        performing this function.

The client intercept routine 102 does not pass rejections from the NDC50 back to a network client, such as the client workstation 42. Theclient workstation 42 remains totally unaware of the consistencyoperations performed by the NDCs 50.

Messages being passed upstream between the NDC sites 22, 26A, 26B and 24always take precedence if they collide with a message for the samedataset being passed downstream between the NDC sites 24, 26B, 26A and22. Sending a message upstream to disable or recall projected images atall upstream sites is the first step performed by the CCS in processinga message that has just created a CWS condition. If a collision occursbetween an upstream message and a downstream message, the message beingpassed downstream has already lost the race to the CCS by a wide margin.

As described above, in response to a request to claim a channel 116, thendc_get_channel( ) routine returns either:

-   -   1. a pointer to a new or old channel 116 to the calling routine        after having waited a short interval if necessary; or    -   2. a NULL pointer to indicate that the request for a channel 116        has been rejected and the calling routine must wait and allow        consistency operations to proceed.

Channel Request Processing Operations

After a channel 116 has been claimed for the purpose of processing arequest, the channel 116 is committed to that request and no otherrequest can use the channel 116 until the current request has completelyfinished and released the channel 116.

Channel commitment is a process by which client requests directed at thesame dataset are sequenced such that each request is fully processedbefore any processing begins on the next request. However, multiple NDCsites 22, 26A, 26B and 24 may simultaneously receive requests for thesame dataset. That is, two or more NDC sites 22, 26A, 26B and 24 maybegin processing requests for the same dataset at about the same time,and both of them may be unaware that any other NDC site is accessing thedataset. The NDC consistency mechanism handles all such cases so itappears that there is a single queue for accessing the dataset. However,due to processing and transmission delays among the NDC sites 22, 26A,26B and 24, the order in which each client requests access to thedataset does not determine which request is processed first. Rather, therequest to be processed is the first one received by the CCS asdescribed in greater detail below. Thus, clients that are “closer” tothe CCS have a slight advantage in processing priority. This slightadvantage probably cannot be detected by application programs executedby the client, such as the client workstation 42.

The concept of committing a channel 116 to a single request until therequest has been satisfied is essential to the consistency controlmechanism of the NDCs 50. For the simple cases of dataset access inwhich there is no CWS, NDC sites 22, 26A, 26B and 24 operateautonomously, which means that the channel 116 is released as soon asthe operation at the NDC site 22, 26A, 26B and 24 completes. That is,the channel 116 at each NDC site 22, 26A, 26B, and 24 is only committeduntil the response has been dispatched to the requesting client.

If a CWS condition exists, all NDC sites from the NDC client terminatorsite 24 down to and including the CCS (which may be located at NDC site26B, 26A or 22) operate in concurrent mode. When operating in concurrentmode, channels 116 supporting a write operation must remain committedbeyond the point at which they dispatch their response to the upstreamNDC site. The channels 116 operating in concurrent mode at each NDC 50remain committed until the upstream NDC site releases them bytransmitting either an NDC_FLUSH or an NDC_RELEASE message. For requestsfrom clients to read a dataset when a CWS condition does not exist, thechannel 116 is released as soon as the response has been dispatched tothe requesting client. Concurrent mode operations are explained morefully below.

Channel Read Operations

When a channel 116 receives a request to read a dataset, it attempts tosatisfy the request directly from images already being projected withinthe channel 116. If additional data is required from downstream, thechannel 116 employs a mash and load technique to fetch the downstreamdata.

As the original client request ripples downstream through successive NDCsites 26B, 26A and 22:

-   -   the DTP server interface routine 104 at each NDC site 26B, 26A        or 22 claims a channel 116 that is committed to servicing the        request;    -   the incoming request is mashed against the image(s) already        being projected within the channel 116 at that NDC site 26B, 26A        or 22; and    -   the NDC 50 at that NDC site 26B, 26A or 22 generates and        dispatches a request downstream that specifies only the data        that must be loaded from below in order to satisfy the request.        The request propagates from NDC site to NDC site toward the NDC        server terminator site 22 until either:    -   1. the request mashes against an image, or set of images, of all        the data requested by the immediately preceding NDC site; or    -   2. the request reaches the NDC server terminator site 22. In        either case, when the request reaches an NDC site having all the        requested data, there exists a series of channels 116 stretching        back from that NDC site to the NDC client terminator site 24.        All channels 116, committed to the request in progress,        effectively have a protective shield surrounding them. No other        request to access data may penetrate this barrier at any point.

If the chain of committed channels 116 doesn't stretch all the way tothe NDC server terminator site 22, it is possible that another requestfor the same data might be made at an NDC site that is downstream fromthis chain of channels 116. The downstream NDC site must issue arecall/disable message to all upstream NDC sites. Upon the arrival ofthis recall/disable message at the downstream end of the chain ofchannels 116, it is queued to await the availability of the channel 116.As soon as the channel 116 at this NDC site responds to a load request,it is freed from its upstream commitment. The channel 116 then initiatesprocessing on the recall/disable message and forwards the recall/disablemessage upstream. The recall/disable message propagates much faster thana load response because the load response has to transfer data. Thus,the recall/disable message will closely follow the load response all theway back to the NDC client terminator site 24. As soon as the clientintercept routine 102 at the NDC client terminator site 24 dispatches aresponse to the client such as the client workstation 42, therecall/disable message invalidates all projected images at the NDCclient terminator site 24.

Another aspect of the channel load operation is that a downstreamchannel 116 never begins to respond to a request until all of therequested data is cached within the channel 116. And, when the responseis finally sent, the channel 116 need not transmit all of the requesteddata. The response by the channel 116 at a downstream NDC site may beflagged as a partial response indicating that more data remains to betransmitted upstream toward the NDC client terminator site 24. Uponreceiving a partial response, the upstream NDC site immediately issues arequest to the downstream NDC site for the remaining data. Thedownstream NDC site's response to this request may be either a full orpartial response. The upstream NDC site keeps requesting more data untilit receives a complete response to its original request. The downstreamNDC site never releases the channel 116 until it has dispatched a fullresponse to the upstream NDC site. In this manner, the NDCs 50 respondto each request to access data from a client site, such as the clientworkstation 42, as an atomic operation.

Channel Write Operations

Datasets are always written at the furthest upstream NDC site possible.The sequence of operations performed in writing a dataset is to:

-   -   load into the NDC site an image of the portion of the dataset        that will be overwritten;    -   write to the image of the dataset projected into the NDC site;        and    -   flush the buffers that contain modified data downstream toward        the NDC server terminator site 22.

The following sections describe the three phases of a write operation ingreater detail.

Load Phase

The NDC 50 loads each block of data that is not already present in theNDC buffers 129 and that will be only partially written into the NDCbuffers 129. The blocks are loaded by calling an ndc_load( ) routinewith a “func” argument of “C_WRITE” to informs the ndc_load( ) routinethat it's loading data to be overwritten by a write operation. The flowof the ndc_load( ) routine invoked with the argument C_WRITE isgenerally the same as it is for a read request, but there are thefollowing differences.

-   -   If this is the first write operation since the channel 116 was        created, the downstream NDC site must be informed of the write        activity even if all necessary data is already present in the        NDC buffers 129 of this NDC 50. If this is the first time the        dataset will be written at any NDC site, the message informing        the downstream NDC site that a write operation is being        performed propagates all the way to the NDC server terminator        site 22. Thus, if any other client becomes active on the        dataset, the CWS condition will be detected.    -   Blocks that will be completely overwritten don't need to be        loaded upstream to the NDC site where the write operation is        being performed. Under such circumstances, the ndc_load( )        routine at the NDC site, such as the NDC client terminator site        24, can allocate empty NDC buffers 129 to receive the data being        written to the dataset.    -   As downstream NDC sites respond to requests for loading the data        needed to perform the write operation, they are informed of the        purpose of the request (message flags==NDC_SITE_WRITING), and        they also are informed whether the modified data will be flushed        downstream through the NDC site at the conclusion of the write        operation. The initial load request also specifies a        “flush-level” that specifies the security required for the        modified data. Each NDC site between the NDC site writing the        data and the NDC server terminator site 22 compares the        flush-level to its own capabilities. If any intervening NDC site        is able to provide the indicated level of security for the        modified data, it flags the load request it is about to issue to        its downstream NDC site with NDC_FLUSH_CONTAINED. Thus, each NDC        site is able to determine the earliest moment at which the        channel 116 can be released from its current commitment. If the        upstream NDC site is enabled for caching, then the channel 116        can be released as soon as the data has passed through the NDC        site. The associated NDC_UPSTREAM_SITE structure has noted that        there is write activity occurring at the upstream site. If any        other client should become active on the dataset, all modified        data will be recalled from above before any transactions from        the new client are processed.

If two NDC sites share a common RAM through which data passes, that datadoes not “clear” the downstream NDC site until it has “cleared” theupstream NDC site. In this situation, the downstream NDC site must notrelease the channel 116 when it responds to the original request fromupstream. Instead, the downstream NDC site leaves the channel 116 busyuntil it receives another message from the upstream NDC site informingit that the returned data has now “cleared” the upstream NDC site. Thisprevents the downstream NDC site from modifying or discarding data uponwhich the upstream NDC site is still operating.

Write Phase

After buffers in the NDC buffer pool 128 have been allocated to receivethe write data, the NDC 50 performs the write operation and all NDCbuffers 129 that are modified are marked as being “dirty.”

Flush Phase

After the NDC 50 completes the write operation, only one task remains tobe accomplished before a response can be dispatched to client, such asthe client workstation 42. The write data that has been entrusted tothis site must be secured to the level that has been requested by theclient, such as the client workstation 42, or demanded by the NDC serverterminator site 22 that owns the data. That is, the NDC site at eitherend of each write transaction may specify the security level. Thehighest level specified by either end of the write transaction willprevail. Either end of a write transaction may specify any of thefollowing security levels.

NDC_FLUSH_TO_SERVER_DISK

NDC_FLUSH_TO_SERVER_STABLE_RAM

NDC_FLUSH_TO_SITE_DISK

NDC_FLUSH_TO_SITE_STABLE_RAM

NDC_FLUSH_TO_NOWHERE

If neither the client, such as the client workstation 42, or the NDCserver terminator site 22 cares very much whether written data isoccasionally lost, and both are willing to trade data security for dataspeed, then the flush phase may be bypassed if both ends of a writetransaction specify the security level NDC_FLUSH_TO_NOWHERE. In thiscase, the write operation has now been completed

However, if either end of a write transaction specifies a security levelhigher than NDC_FLUSH_TO_NOWHERE, an ndc_flush( ) routine will be calledto flush all dirty NDC buffers 129 to an NDC site with an acceptablelevel of security. Note that if the level is NDC_FLUSH_TO_(— SITE)_(— STABLE)_RAM and the dirty data at this NDC site is already stored instable RAM, such as battery backed RAM or FLASH RAM, from which it willnot be lost in the event of a power failure, then the ndc_flush( )routine returns immediately-

The moment the NDC 50 modifies the data, the NDC buffer 129 is tagged asdirty. If the data in a dirty NDC buffer 129 is not flushed downstreamat the first opportunity, which occurs immediately before the channel116 is released at the conclusion of the processing of the writerequest, then the channel 116 is flagged as C_DELAYED_WRITE.

If a CWS condition does not exist, and if both the client, such as theclient workstation 42, and the NDC server terminator site 22 aren'tconcerned about losing modified data, the data flows upstream to an NDCsite where it remains for an extended period of time while beingmodified. Eventually, the client will stop accessing the dataset, andsometime after that the channel 116 will be moved from the CQ_READY listto the CQ_CLEAN list by a Flush daemon routine. When the channel 116 ismoved from the CQ_READY list to the CQ_CLEAN list, any dirty NDC buffer129 that hasn't been flushed downstream for security reasons and isstill lingering about, will be flushed at this time.

Modified data in the NDC buffers 129 of an NDC 50 becomes characterizedas C_DELAYED_WRITE data if it was not flushed downstream at the firstopportunity upon releasing the channel 116 at the end of a writeoperation. Dirty data isn't C_DELAYED_WRITE data until the routine thatcould have flushed the data downstream has been bypassed. When such datais finally flushed downstream, the C_DELAYED_WRITE flag is removed.

If, as part of the load phase of a write request, data is pumpedupstream through an NDC site that does not cache an image of the data,the channel 116 at that NDC site must not be released. Under suchcircumstances, the NDC site that is writing the dataset will soon beflushing data back downstream through this NDC site as the last phase ofresponding to a write request.

Channel Maintenance Operations

In general, file servers, such as the NDC server terminator site 22, areoften under utilized, with their processor(s) spending a significantpercentage of their time waiting for work. When engaged in processingrequests as described above, NDCs 50 postpone all operations that arenot essential to completing the responses. At such times, each NDC 50performs only those operations absolutely required to respond to therequests. When there are no requests awaiting processing, the NDC 50activates channel daemons to use the processor's “idle” time forpreparing for the next volley of requests that will eventually arrive.Any process of the NDC 50 involved in directly servicing a clientrequest preempts all daemons as soon as the current daemon, if one isoperating, relinquishes control.

The Reaper Daemons

The NDC 50 invokes a Reaper daemon routine as a background task wheneverthe number of channels 116 enqueued on the CQ_EMPTY list drops below theCQ_EMPTY_LOW_THRESHOLD. Responding to this condition, the Reaper daemonroutine iteratively removes the channel 116 at the front of the CQ_CLEANlist and releases all NDC buffers 129 assigned to it. If the channel 116removed from the front of the CQ_CLEAN list is a complex channel 116,the Reaper daemon routine reduces it to a collection of simple channels116, all of which the Reaper daemon routine places at the front of theCQ_EMPTY list. At this point in the process, the channel 116 from whichthe Reaper daemon routine removed all the other channels 116 may stillcontain valid data attributes. Under such circumstances, the Reaperdaemon routine enqueues the simple channel 116 at the back of theCQ_EMPTY list because a possibility still exists that the channel 116may be claimed for responding to a request to access the same datasetbefore it percolates up to the front of the CQ_EMPTY list to be claimedfor responding to a request to access a different dataset.

At the end of each iterative cycle of removing a channel 116 from thefront of the CQ_CLEAN list and placing one or more channels 116 on theCQ_EMPTY list, the Reaper daemon routine checks to see if any newrequests to access data have been received by the NDC 50. If a newrequest has been received, the Reaper daemon routine relinquishescontrol to the foreground task that will respond to the request. TheReaper daemon routine will only resume operation when there no longerare any more pending requests to access data.

If the number of channels 116 enqueued on the CQ_EMPTY list exceeds theCQ_EMPTY_HIGH_THRESHOLD, the Reaper daemon suspends its operation andwill not again resume operating until the number of channels 116enqueued on the CQ_EMPTY list again drops below theCQ_EMPTY_LOW_THRESHOLD.

The Flush Daemon

A Flush daemon routine locates channels 116 on the CQ_LOCKED,CQ_SERVICE, or CQ_READY lists that have been flagged as C DELAYED WRITE,and flushes downstream toward the NDC server terminator site 22 all NDCbuffers 129 assigned to such channels 116 that are flagged as B_DELWRI.After a channel 116 has been processed by the Flush daemon routine, thechannel 116 is enqueued at the end of the CQ_CLEAN, the CQ_SERVICE, orthe CQ_LOCKED list depending upon the flags that are set in the channel116.

The Loadmaster Daemon

The NDC 50 invokes a Loadmaster daemon routine whenever there are noresponses pending to requests to access data. The Loadmaster daemonroutine checks channels 116 enqueued on the CQ_SERVICE list and assignsthem individually to Supervisor daemon routines which perform theservices required by the channel 116. After a channel 116 has beenserviced, it is enqueued on the end of the CQ_READY list.

The Supervisor Daemons

The Supervisor daemon routines receive channels 116 that have beenremoved from the CQ_SERVICE list by the Loadmaster daemon routine,forecast future requests to access data that will be forthcoming fromthe client(s), such as the client workstation 42, and generate anyrequests for services from downstream NDC sites that are necessary sothe channel 116 can respond immediately to a request from the client toaccess data. After the NDC 50 receives responses from downstream NDCsites to the requests generated by the Supervisor daemon routine, thechannel 116 is enqueued at the end of the CQ_READY list.

The Loader Daemon

Loader daemon routines are low level routines that perform simpleasynchronous tasks needed for the operation of the NDC 50, such assubmitting a request to a downstream NDC site or to a disk subsystem,and then waiting for a response to that request.

Channel Release

After a request has been completely serviced at an NDC site 22, 24, 26A,or 26B, the channel 116 is released. The process for releasing channels116 operates as follows:

-   -   If dirty data is being projected within the channel 116, the NDC        50 calls the ndc_flush( ) routine to ensure that all modified        data is secured to a level acceptable to both the client and the        server,    -   If the downstream channel 116 is still committed, the NDC 50        sends it a release message and waits until a response is        received. The release message may, in some instances, propagate        downstream through all NDC sites until it reaches the NDC server        terminator site 22.    -   If one or more processes are waiting for this channel 116 or any        channel 116, all of them are scheduled to run.    -   Enqueue the channel 116 at the tail of the channel free list.    -   Reset channel flags: C_BUSY, C_WANTED, C_ASYNC, and others.

After all of the preceding operations have been performed, the channel116 becomes available for use by any other request that has already beenreceived or will arrive in the future.

Channel Death

Most channels 116 eventually die. The primary cause of death isinvariably lack of use. As long as channels 116 are continually used,they continue to live. When a channel 116 dies, the following operationsare performed:

-   -   Any dirty data that is still being retained within the channel        116 is flushed downstream toward the NDC server terminator site        22.    -   If the downstream channel 116 is still committed, the NDC 50        sends it a notification that the channel 116 is in the process        of dying. This notification will piggyback on any dirty data        being flushed downstream. However, at this point there usually        isn't any dirty data still lingering in the channel 116. The        thresholds established for the various channel daemons cause        them to flush modified data downstream toward the NDC server        terminator site 22 more quickly than the channel daemons reclaim        channels 116.    -   After receiving a response from the NDC 50 at the downstream NDC        site 22, 26A or 26B to the decease notification, the NDC 50        releases all resources allocated to the channel 116 and the        channel 116 is flagged as invalid and empty.    -   If the death of the channel 116 was initiated by a demand for a        new channel 116, the channel 116 is returned to the requesting        process.    -   If the death of the channel 116 was caused by the operation of a        channel daemon, the channel 116 is enqueued at the head or tail        of the CQ_EMPTY free list, depending upon whether or not the        attributes for the dataset stored in the channel 116 remain        valid.

Only channels 116 located at the NDC client terminator site 24 eversuffer death by lack of use. Downstream channels 116 are always enqueuedon the CQ_LOCKED free list when they're not busy. Channels 116 on theCQ_LOCKED free list, immune against LRU replacement, only die whennotified by their last upstream channel 116 that it is dying, or whenthe upstream site fails to respond to status queries and is presumed tobe dead.

Downstream channels 116 are the communication links that bind thechannels 116 of the NDC client terminator site 24 to the NDC serverterminator site 22. Downstream channels 116 cannot be re-allocatedwithout isolating all upstream channels 116 from consistency controloperations. If an NDC site becomes isolated temporarily from the networkdue to a communications failure and if any other clients remaining onthe network process datasets for which the isolated NDC sites haveactive channels 116, after communications are restored any datamodifications performed at the formerly isolated NDC site must berejected by downstream NDC sites when the formerly isolated NDC sitesubsequently attempts to flush the data back downstream toward the NDCserver terminator site 22. Thus, downstream channels 116 only die whenthe upstream channel 116 dies or, at least, is thought to be dead.

Data image projections are sustained in downstream channels 116 onlywhen that channel 116 has multiple upstream connections and, even then,only under certain circumstances. So, downstream channels 116 rarelyretain resources of the NDC site when enqueued on the channel free list.Only the channel structure itself, approximately 500 bytes, must remaincommitted to providing the linkage between the upstream and downstreamsites.

When projected, NDC metadata, e.g., filesystem and file attributes, isalways stored directly within the channel structure. This means thatidle downstream channels 116 still retain information about the datasetto which they're connected.

The three events that can trigger the death of a channel 116 are:

-   -   The channel 116 advances to the head of the channel free list        and a request is made for a new channel 116. When this occurs,        after flushing any dirty data within the channel 116 at the head        of the channel free list downstream toward the NDC server        terminator site 22, the channel 116 is re-allocated to support        accessing a new dataset after downstream NDC sites have been        properly notified that the channel 116 is dying,    -   The NDC 50 receives a decease notification from the last        remaining upstream NDC site that is accessing the dataset. The        decease notification message causes the downstream NDC site to        enter the death sequence and may result in a decease        notification propagating further downstream toward the NDC        server terminator site 22.    -   A channel usage timer indicates that there has been no activity        on the channel 116 for quite a while. If the channel 116 is        located at the NDC client terminator site 24, it can just be        killed at this point. If the channel 116 is located downstream        from the NDC client terminator site 24, the channel 116 must        send a status query message to all its upstream connections.        This status query message indicates the urgency with which the        downstream NDC site wishes to kill the channel 116. After        responding to the status query message, the upstream NDC client        terminator site 24 may kill its channel, but the upstream NDC        client terminator site 24 need not do so. However, upstream NDC        sites must respond within a reasonable interval to the status        query from the downstream NDC site or the downstream NDC site        will assume the upstream channel 116 has died.

NDC Inter-Site Operations

Both control and data information must be communicated between NDC sites22, 26A, 26B and 24. Data communicated between NDC sites 22, 26A, 26Band 24 is always one or more byte sequences of the named dataset.Control information is a bit more complicated, and can be categorized asfollows:

-   -   NDC metadata is information about the named dataset such as:        filesystem and file attributes, server name, filesystem id, and        file id.    -   DTP control is information generated by and used by the NDC        sites 22, 26A, 26B and 24 to ensure the consistency of all        delivered data and NDC metadata.

DTP control information is interwoven into the fabric of the DTP, theprotocol through which both data and NDC metadata are passed between NDCsites 22, 26A, 26B and 24.

FIG. 9 is a table written in the C programming language that listsvarious different types of DTP messages 52 that may be exchanged betweenpairs of NDC sites, such as the NDC sites 22, 26A, 26B, and 24. FIG. 10defines a data structure in the C programming language that is used inassembling any of the various different DTP messages 52 listed in FIG.9. FIGS. 11A through 111 define data sub-structures in the C programminglanguage that are incorporated into the channel 116 illustrated in FIG.4 and in the data structure for DTP messages 52 illustrated in FIG. 10.FIG. 12 defines a structure in the C programming language that is usedin forming chains of DTP messages 52 thereby permitting several DTPmessages 52 to be exchanged between NDC sites as a single atomiccommunication.

Metadata for each channel 116 consists of all of the data stored in eachchannel 116 except for the data requested by a client, such as theclient workstation 42, that is stored in the NDC buffers 129. Two datastructures in each channel 116 contain the metadata that is most vitalto the performance of the channel 116. FIG. 13A defines a data structureNDC_ATTR in the C programming language that specifies information aboutthe named set of data to which the channel 116 is attached. FIG. 13Bdefines a data structure NDC_STATS in the C programming language thatcontains information about the file system on which the dataset resides.

Described below are the various modes in which the NDC site 22, 24, 26Aand 26B operate, and the consistency operations that must be performedbetween the NDC sites 22, 24, 26A and 26B.

Modes of Operation

An NDC site 22, 24, 26A and/or 26B operates in one of two modes,autonomous or concurrent.

Autonomous Mode of Channel Operation

Whenever possible, a channel 116 services a request using only locallyavailable resources. For clients that are accessing data sequentially,the channel 116 aggressively pre-fetches or pre-buffers ahead of theclient's current requests to access data. This mode of operation forchannel 116 is referred to as “autonomous.” A channel 116 is said tohave operated autonomously whenever it responds to a request from aclient, such as the client workstation 42, using only data and NDCmetadata cached at its NDC site 22, 24, 26A or 26B prior to receivingthe request. Datasets no larger than 144 k bytes are usually completelystored in the NDC buffers 129 at the NDC client terminator site 24,enabling all requests to access datasets smaller than 144 k bytes to beserviced autonomously by the NDC client terminator site 24.

NDC sites that are permitted to cache projected images of data have apotential to operate autonomously. A channel 116 located at an NDC sitethat is not permitted to cache projected images of data cannot operateautonomously. Channels 116 located at NDC sites that are not permittedto cache projected images of data must always transmit a requestdownstream to an NDC site in responding to each request from a client,such as the client workstation 42.

Autonomous operation of channels 116 is the major cornerstone upon whichvery large scale distributed file systems can be built. Autonomousoperation of channels 116 provide the basis for:

-   -   Quick response times. Since the channel 116 of an NDC site 22,        24, 26A or 26B that is operating autonomously doesn't need to        communicate with downstream NDC sites in responding to a request        to access data from a client, the client, such as the client        workstation 42, does not experience any of the delays inherent        in such communication.    -   High bandwidth data transfers. If the NDC client terminator site        24 is located within the client workstation 42, the data        transfer rate can be extremely high (50 to 100 Mbytes/sec). A        response from the channel 116 to a client's request to access        data when both the channel 116 and the client operate in the        same computer need only consist of a return of pointers to the        data that the channel 116 had previously stored in the NDC        buffers 129 of the NDC client terminator located within the        client workstation 42.    -   Network scalability. For the average dataset, channels 116        located in NDC sites 26A, 26B or 24 that operate autonomously        place no load on downstream NDC sites 22, 26A or 26B after the        dataset has been loaded into the NDC client terminator site 24.        Downstream NDC sites 22, 26A or 26B must initially supply data        and NDC metadata to the channel 116 in the NDC client terminator        site 24 that operates autonomously. However, once the data and        NDC metadata are respectively stored in the NDC buffers 129 of        the channel 116 of the NDC 50, the client, such as the client        workstation 42, may access the data and metadata many times        without requiring any further communication between the NDC        client terminator site 24 and the downstream NDC sites 26B, 26A        or 22. If each NDC site 24, 26B or 26A does not need to        repetitively request data from downstream NDC sites, the        networked digital computer system 20 can support a larger number        of clients, such as the client workstation 42, with an        acceptable response time.

The advantages of operating in autonomous mode are so significant thatevery reasonable effort is made to ensure that channels 116 operate inthis mode whenever possible. The inability to operate a channel 116autonomously is always the result of a single cause, i.e., the requireddata and metadata isn't being projected into the local NDC site 26A, 26Bor 24.

When operating in autonomous mode, an NDC site 22, 26A, 26B or 24functions in a manner similar to the CCS. Whenever possible, thechannels 116 of such an NDC site respond to requests to access data froma client, such as the client workstation 42, without communicating withdownstream NDC site 26B, 26A or 22. If an upstream message should arriveat an NDC site 26A, 26E or 24 that is operating in autonomous mode whilethe NDC site 26A, 26B or 24 is processing a request on that same channel116, the upstream message must wait until the NDC site 26A, 26B or 24 isable to process to it. An autonomous NDC site 22, 26A, 26B or 24 hasevery right to operate as though it is the CCS until it is notified thatit can no longer function in that manner. If the upstream message is anotification that the NDC site 26A, 26B or 24 may no longer functionautonomously, that notice doesn't become effective until the NDC 50processes the message.

After the client workstation 42 first requests access to data from theNDC client terminator site 24, the NDC sites 22, 26A, 26B and 24establish their local channels 116, and the NDC sites 22, 26A and 26Bload the first data into the NDC buffers 129 of the NDC clientterminator site 24, the locally projected dataset image will always besufficient to enable autonomous request servicing unless one of thefollowing occurs:

-   -   Access to the same dataset by another client creates a CWS        condition. If this occurs, the downstream CCS only permits        images to be projected into the NDC site 26A, 26B or 24 during a        brief instant as the projected data passes through the NDC site        26A, 26B or 24.    -   A client, such as the client workstation 42, requests access to        data randomly, and the dataset being accessed by the client is        too large to be completely cached at the NDC client terminator        site 24. Random accesses to data by a client, such as the NDC        client terminator site 24, prevents the channel 116 from        anticipating future requests from the client. If a channel 116        determines that a client, such as the client workstation 42 is        accessing data randomly, the channel 116 stops pre-fetching data        for that client.

If neither of the preceding conditions occur, channels 116 operateautonomously, pre-fetching data in anticipation of future requests toaccess data from the client, such as the client workstation 42.Depending on the current load being supported at NDC sites 22, 26A, 26Band 24, channels 116 at NDC sites that are operating autonomouslypre-fetch data either asynchronously or synchronously.

Asynchronous Pre-Fetching

A channel daemon usually pre-fetches data for a channel 116 that isoperating autonomously. The channel daemon keeps the image of dataprojected into the channel 116 just ahead of the next request to accessdata that the channel 116 receives from the client, such as the clientworkstation 42. If the main goal of the pre-fetch mechanism was tominimize the usage of local resources, the projected image would consistof only the exact data specified in the next client request, and theimage of the data would always be projected just in advance of theclient's next request. However, while this may conserve the resources atthe NDC client terminator site 24, it is very wasteful of resources ofthe networked digital computer system 20. It is much more efficient forthe networked digital computer system 20 to employ fewer requests andtransfer larger amounts of data in response to each request to load datainto the NDC client terminator site 24. However, transferring a largeramount of data will increase any delay in responding to a clientrequest.

To minimize the delay in responding to a client request to access data,the channel 116 usually requests from the downstream NDC site 26B, 26Aor 22 only that data which is required to respond to the current requestto access data. As soon as the channel 116 receives the data, thechannel 116 responds to the client, such as the client workstation 42.As soon as the channel 116 in the NDC client terminator site 24 respondsto the request from the client, the NDC 50 begins processing any otherclient requests that have been queued. After the NDC 50 processes allqueued client requests, channel daemons may begin operating in thebackground. As described above, operating channel daemons continuouslycheck all active channels 116 to determine if the channels 116 arewithin one request time interval of being unable to immediately respondto a request from the client, such as the client workstation 42.

If a channel daemon detects that a channel 116 is within one requesttime interval of being unable to immediately respond to a request fromthe client, the daemon does whatever is necessary to obtain additionaldata from downstream NDC sites 26B, 26A and 22 so the image of dataprojected into the channel 116 stays ahead of requests to access datafrom the client. For a channel 116 in the NDC client terminator site 24that is supporting read operations on datasets, the channel daemonasynchronously issues a request to the downstream NDC site 26Brequesting roughly enough data to respond to the next eight requestsfrom the client, such as the client workstation 42. When the dataarrives from the downstream NDC site 26B, the channel 116 stores thedata in the NDC buffers 129 selected by the daemon. The NDC buffers 129used to receive the data are frequently the ones that are already beingused by the channel 116 for the current projected image of data. In thisway, that portion of the image that the NDC 50 has already presented tothe client is replaced by a portion of the dataset toward which requestsfrom the client are advancing.

If a request from the client, such as the client workstation 42, arriveswhile a channel daemon is refilling the channel 116, the NDC 50 blocksthe request until the downstream operation initiated by the channeldaemon completes. Thus, if channel daemons successfully anticipateclient requests to access data, the channel 116 continues to operateautonomously.

Synchronous Pre-Fetching

The asynchronous mode of autonomous operation shifts as much processingas possible from the foreground task of servicing requests from theclient, such as the client workstation 42, into the background task ofpreparing to service the next request from the client. The strategy ofshifting processing from the foreground task to the background tasktrades off throughput for response time. Clients, such as the clientworkstation 42, experience faster response times, but the NDC site 22,26A, 26B or 24 has reduced throughput capacity. This is a reasonabletrade off since NDC sites 22, 26A, 26B and 24 rarely run near theirthroughput capacity. However, intervals in the operation of NDC sites22, 26A, 26B and 24 will occur that require maximum throughput ratherthan minimum response time. During intervals of peak demand, a normallyunused synchronous mode of pre-fetching data from downstream NDC sitesreplaces the asynchronous mode to maximize the throughput of the NDCsites 22, 26A, 26B and 24.

The synchronous mode of operation is activated if CPU utilization at anNDC site 22, 26A, 26B or 24 exceeds a pre-established threshold. Insynchronous mode, the channel daemons are not activated and the routinesfor responding to requests to access data no longer defer to the channeldaemons the loading of data into and unloading of data from the channels116. When the NDC 50 operates in synchronous mode, data is requestedfrom downstream NDC sites only if the upstream NDC site is unable torespond to a request.

If a channel 116 requires additional data and the NDC 50 is operating insynchronous mode of autonomous operation, the channel 116 requests therequired data from the downstream NDC site data plus additional data toincrease the efficiency of loading data into the channel 116 at thissite. During intervals in which the NDC 50 operates in synchronous mode,large amounts of data are fetched directly by the channel 116 each timethe channel 116 discovers that additional data not present in the NDCbuffers 129 of this NDC site 22, 26A, 26B or 24 is required to respondto a request. By requesting large amounts of data from downstream NDCsites only when the channel 116 is unable to respond to a request toaccess data, the channel 116 maximizes throughput of its NDC 50, butclients, such as the client workstation 42, experience additional delayeach time the channel 116 is compelled to request data from a downstreamNDC site 26B, 26A or 22.

Concurrent Mode of Channel Operation

Projected images of data occur only in channels 116 that are operatingautonomously. As explained in greater detail below, autonomous channels116 always occur at, or downstream of, a CCS or an NDC client terminatorsite 24 that is functioning similar to a CCS. NDC sites 26A, 26B or 24upstream of the CCS, when the CCS is located in the NDC serverterminator site 22, always operate in concurrent mode. NDC sites 26A,26B or 24 upstream of the CCS, when the CCS is located in the NDC serverterminator site 22, operate as an extension of the CCS site throughwhich the image of the dataset being projected into the CCS may beviewed.

Channels 116 operating in concurrent mode sustain an image of projecteddata for only the briefest period, i.e., from the time the channel 116receives the data from the downstream NDC site until the channel 116forwards data to the next upstream NDC site or to the client, such asthe client workstation 42. Channels 116 operating in concurrent modealways request exactly the data required to satisfy the current request,never more and never less.

Consistency Control Operations

FIG. 15 depicts a tree, indicated by the general reference character200, of NDC sites 22, 26A, 26B, 24, 202, 204A, 204B, and 206 that areconnected to the file 156. LAN 44A connects to NDC client terminatorsite 204B while LAN 44B connects to NDC client terminator site 206. If aCWS condition were created by a combination of the NDC site 24 andeither NDC site 204B or 206, NDC site 26A becomes the CCS for the file156. NDC site 26A is as far as the file 156 can be projected from theNDC server terminator site 22 without requiring a distributed cacheconsistency mechanism.

If a CWS condition does not exist, all NDC sites 22, 26A, 26B, 24, 202,204A, 204B, and 206 may operate autonomously. The NDC sites 22, 26A,26B, 24, 202, 204A, 204B, and 206 when operating autonomously maysustain a projected image of data that may be used to support clientread and write operations over an extended period of time. Autonomoussites communicate with the next downstream NDC site 204A, 202, 26B, 26A,or 22 only when the upstream NDC site 206, 204A, 204B, 202, 26A, 26B, or24 requires additional data, or when modified data must be flusheddownstream toward the NDC server terminator site 22.

However, if a CWS condition arises, the first NDC site 26A or 202upstream of the data source, such as the hard disk 32, that providesmultiple connections to the dataset for upstream NDC sites 206, 204B, or24 must assume responsibility for maintaining the consistency andintegrity of all operations being performed on the dataset. The NDC site26A or 202 that assumes this responsibility is located furthest from thesource of the data, such as the hard disk 32, through which must passall requests to access the dataset from current clients, such as theclient workstation 42. Thus, if a CWS condition were created by acombination of the NDC site 24 and either NDC site 204B or 206, NDC site26A would become the CCS for the file 156.

If one of the NDC sites 26A or 202 declares itself to be the CCS for thedataset, the NDC site 26A or 202:

-   -   1. recalls the image of the dataset that has been modified from        the upstream NDC client terminator site 206, 204B, or 24 so that        its image of the data contains all the modifications; and    -   2. disables all other upstream projections of the data that were        in use by NDC sites to support read operations on the dataset.

After completing these operations, the CCS is now the most distant NDCsite into which images of the dataset will be projected. Upstream NDCsites must now operate in concurrent mode, forwarding any requests theyreceive to access the dataset to the CCS for processing. The CCSprocesses requests to access the dataset in the order they are received,and ensures completion of each request before beginning to process asucceeding request to access the dataset.

Detecting CWS

Each of the NDC sites 22, 26A, 26B, 24 202, 204A, 204B, and 206independently records whether a request to access a dataset will or willnot modify the dataset. As an NDC site 22, 26A, 26B, 202, or 204Aprocesses each request to access a dataset, it compares the requestedoperation with the operations that are being performed on the dataset atall other upstream NDC sites. If there are multiple upstream NDC sitesaccessing a dataset and any one of them is writing the dataset, then aCWS condition exists. As soon as an NDC site 26A or 202 detects a CWS,the NDC site 26A or 202 must declare itself to be the CCS as describedabove.

To permit each NDC site 22, 26A, 26B, 202 and 204A to detect a CWScondition, each upstream NDC site 206, 204A, 204B, 202, 24, 26B, and 26Amust keep its downstream NDC site informed of types of accesses, i.e., a“read” access that will not modify the dataset or a “write” access thatwill modify the dataset, that are being made to the dataset at the NDCclient terminator site 206, 204B, or 24. Each downstream NDC site 204A,202, 26B, 26A, and 22 must record and preserve the information providedit by its upstream NDC sites until the downstream NDC site 204A, 202,26B, 26A, or 22 is notified of the death of the channel 116 at theupstream NDC site.

Informing Downstream NDC Sites

If a client, such as the client workstation 42, begins accessing adataset with a new type of access, e.g., accessing the dataset with a“write” operation when all previous accesses have been “read”operations, the NDC site 26A, 26B or 24 responding to requests from theclient must inform the downstream NDC site 22, 26A or 26B. Usually, thisnotification takes place automatically when the NDC site 26A, 26B or 24requests access (NDC_LOAD message) to the dataset from the downstreamNDC site 22, 26A or 26B. Since each NDC site 24, 26B and 26A requestsonly the data that is not present in its NDC buffers 129, the datarequested by each successive NDC site 24, 26B or 26A may change fromthat requested from it. However, the nature of the request to access thedataset doesn't change. A request from a client, such as the clientworkstation 42, to the NDC client terminator site 24 to “read” a datasetremains a “read” operation as it propagates downstream from NDC site toNDC site. Similarly, a request to “write” a dataset remains a “write” asit propagates downstream.

However, if an image of a dataset has been projected in response to arequest to read the dataset, and if the client then seeks to modify thedataset in an area that is wholly contained within the NDC buffers 129of the NDC site 26A, 26B or 24, then no additional data is required fromdownstream NDC sites 22, 26A or 26B. However, if this occurs the datasetcannot be written immediately since the possibility exists that anotherclient accessing the dataset at another NDC site might also berequesting to write the dataset. If two clients concurrently write thesame dataset, there would then be two projected images of the same namedset of data that, most likely, would be different!

Therefore, if a client seeks to perform a write operation on a projectedimage of a dataset that will overlay only data already loaded into theNDC buffers 129 of the NDC client terminator site 24 in response torequests to read the dataset, the NDC site 24 must send an informmessage to downstream NDC sites 26B, 26A or 22. An inform message froman upstream NDC site 26A, 26B or 24 requests no data from the downstreamNDC site 22, 26A or 26B. The inform message merely informs thedownstream NDC site 22, 26A or 26B that write operations are now beingperformed on the dataset at the upstream NDC site 26A, 26B or 24.

After an NDC site 26B or 26A is informed, either implicitly orexplicitly, that a write operation is being performed at an upstream NDCsite 26B or 24, and if the activity on this dataset at upstream NDCsites 26B or 24 differs from the type of activity that was already beingsupported at the NDC site 26A or 26B, the NDC site 26A or 26B musttransmit the inform message further downstream toward the NDC serverterminator site 22.

An inform message propagating downstream from NDC site to NDC site maybe rejected at any NDC site. If an inform message is rejected by adownstream NDC site, the rejection must propagate upstream until itreaches the client intercept routine 102 of the NDC site that originatedthe request. Upon receiving the rejection of an inform message, theclient intercept routine 102 backs-off and allows a recall/disablemessage, which has either already arrived or will arrive very shortly,to claim the channel 116 and recall or disable the image of the datacurrently present in the NDC buffers 129.

Upstream Site Structures

An NDC site 22, 26A or 26B receiving information about the activitiesoccurring on a dataset at an upstream NDC site 26A, 2613 or 24 mustrecord and preserve the information. FIG. 14 depicts an upstream sitestructure 182, that is used by NDC sites 22, 26A, 26B, 202 or 204A torecord and preserve information about activities occurring on a datasetat an upstream NDC site. Each NDC 50 creates upstream site structures182 as required by invoking a memory allocation routine (such as theUnix malloc( ) routine) to request an area in RAM of about 16 to 20bytes. The NDC 50 returns the RAM allocated for each upstream sitestructure 182 to the free memory pool upon receiving a deceasenotification from the upstream NDC site for which the NDC 50 created theupstream site structure 182.

If NDC site 22, 26A, 26B, 202, or 204A has multiple upstream connectionsto the same dataset, it will have the same number of instances of theupstream site structures 182, one per upstream NDC site. The upstreamsite structures 182 are linked together using the *next element in eachupstream site structure 182. The *uss element in the channel 116 for thedataset points at the first upstream site structure 182 in the list ofupstream site structures 182. The *next entry in the last upstream sitestructure 182 in the list is assigned a NULL value. A NULL value isassigned to the *uss element in the channel 116 at the NDC clientterminator site 24 indicating that there are no sites further upstream.

The other elements of the upstream site structure 182 are:

-   -   upstream_addr which is the address of the upstream NDC site;    -   current_state which is the state that this NDC site believes the        upstream NDC site to be in;    -   actual_state which is returned by the upstream NDC site in its        response to a recall/disable message; and    -   error which preserves an error condition occurring during a        recall/disable operation until such time that the operation can        be presented to the upstream NDC sites.

Channel Decease Notifications

The downstream NDC site 22, 26A, 26B, 202, or 204A must at all times beaware of the types of activities being performed at its upstream NDCsites. When channels 116 upstream from an NDC site 22, 26A, 26B, 202, or204A are about to die, they must inform their downstream NDC site. Whena channel 116 dies, it ceases whatever type of activity it had beenperforming.

If a downstream NDC site 26A or 202 that is currently the CCS receives adecease notification from a channel 116 at an upstream NDC site, thecurrent CCS may determine that the CWS condition no longer exists. Whenthis occurs, the CCS relinquishes the CCS function and allows images ofdata to be re-projected into upstream NDC sites in response to requeststo access data.

If a channel 116 receives a decease notification from its only upstreamNDC site 26A, 26B, 24, 202, 204A, 204B, or 206 and there are no localclients such as the client workstation 42 accessing the dataset, thechannel 116 immediately dies. In dying, each channel 116 issues its owndecease notification to its downstream NDC site.

Recall/Disable Messages

If an NDC site 22, 26A, 26B, 202 or 204A receives an inform message,which occurs implicitly in every communication from upstream NDC sites,the NDC site 22, 26A, 26B, 202, or 204A checks to determine if this typeof activity is already being supported at the upstream NDC site. If thistype of activity is not already being supported at the upstream NDCsite, then the new type of activity may have created a CWS condition.

If a NDC site 26A or 202 determines that a CWS condition has just beencreated, it must immediately disable all upstream projections of thedataset and recall any data that has been modified at the upstream NDCsite 206, 204B, or 24. To disable all upstream projections and recallany modified data, the downstream NDC site 26A or 202 processes its listof upstream site structures 182, sending a disable message to eachupstream NDC site 206, 204B, and/or 24 that is reading the dataset, or arecall message to the single upstream NDC site 206, 204B, or 24 that iswriting the data set.

Ignoring for the time being the NDC site 206, 204B, or 24 whose requestto access data created the CWS condition, when an NDC site 202 or 26Adetermines that it must become the CCS, there can only be one or moreclient workstations 42 that are reading the dataset, or a single clientworkstation 42 that is writing the data set. In responding to the CWScondition, the newly declared CCS either issues a single recall messageto an upstream NDC site, or one or more disable messages. The manner inwhich a CWS condition occurs determines whether the CCS will send eithera single recall message or one or more disable messages.

If one or more client workstations are accessing the dataset for readingit and a client workstation subsequently begins to write the dataset,then the newly declared CCS issues disable messages to all upstream NDCsites other than the one that created the CWS condition, and thenappropriately responds to the request just created the CWS condition. Ifthe NDC client terminator site that created the CWS condition has in itsNDC buffers 129 a projected image of all the data needed for writing thedataset, then the newly declared CCS merely informs the NDC clientterminator site that the projected image of the data must be flushedback to the CCS upon completion of the write operation. If the NDCclient terminator site that created the CWS condition has requestedadditional data from downstream NDC sites because its NDC buffers 129lack a projected image of all the data needed for writing the dataset,then the newly declared CCS does whatever is necessary to supply the NDCclient terminator site with the requested data and concurrentlyinstructs the NDC client terminator site that it must flush theprojected image of the data back to the CCS upon completion of the writeoperation.

If a single client workstation is writing the dataset and another clientworkstation subsequently creates a CWS condition by accessing thedataset for any purpose, then the newly declared CCS issues a singlerecall message to the NDC client terminator site that has been writingthe dataset, waits for the projected image of the dataset to be flushedback from the NDC client terminator site to the CCS, and then doeswhatever is necessary to respond to the request that created the CWScondition.

If several clients, such as the client workstation 42, are widelydistributed across a network and concurrently submit requests that willresult in a CWS condition, the message from each NDC site races withmessages from the other NDC site(s) to whichever NDC site willeventually become the CCS. The first message to reach the NDC site thatwill become the CCS is processed first and blocks the further processingof later arriving messages until it has been completely processed. Allmessages arriving after the first message queue up in the order of theirarrival at the NDC site that will eventually become the CCS. After thefirst message is completely processed, these later arriving messages areprocessed one after another in the order of their arrival. Eventuallythe NDC 50 processes the message that creates the CWS condition. Whenthe CWS condition occurs, the NDC 50 immediately dispatches therecall/disable message(s) to the upstream NDC sites. Any messages fromother NDC sites that remain enqueued at the newly declared CCS areprocessed in order, and each is rejected because the channel 116 is busyrecalling or disabling the NDC sites that issued these messages.

Responding to a CWS condition does not necessarily require two differenttypes of messages, i.e., a disable message and a recall message. Asingle type of message that commanded upstream NDC sites to disabletheir caches, and flush dirty data back to the CCS as part of thedisable process at the upstream NDC sites would suffice. However, usingtwo distinct message types allows the upstream NDC sites to confirmtheir agreement on the current state of their channels 116.

Upstream and Downstream Messages

Recall and disable messages are referred to as “upstream” messages,because they flow upstream from the NDC site that transmits them. Thestatus query is another type of upstream message. Except for deceasenotifications, all other requests are initiated by real clients, such asthe client workstation 42, and always flow downstream. Such messages maybe generically referred to as “downstream” messages.

If there are multiple upstream NDC sites, several recall/disablemessages are all transmitted asynchronously at about the same time. Theprocess generating these messages then blocks the processing ofadditional messages for the channel 116 at this NDC site until allupstream NDC sites have responded or until the time interval allowed fora response expires. If an NDC site fails to respond within the timeallowed, a timeout error is recorded in the appropriate upstream sitestructure 182. If later an upstream channel 116 for which a timeouterror has been recorded attempts to re-establish communication with thedownstream channel 116, it will be notified that it has beendisconnected from the dataset. If clients along a path were only readinga dataset, it is likely that they may continue processing the datasetwithout being notified of the disruption. However, if one of the clientshas modified an image that is stored within the NDC buffers 129 at anNDC site that has been disconnected from the network perhaps due to acommunication failure, and the dataset's modification time indicatesthat the dataset has been modified since service was interrupted, thenif an attempt is made to flush the modified data back toward the NDCserver terminator site 22, the flush request must be rejected. Therejection of the flush request must propagate to all upstream NDC sites,and cause an error message to be presented to the client, such as theclient workstation 42.

In addition to communication failures, other types of errors are alsopossible during a recall/disable operation. Any errors that occur alongan upstream path during a recall/disable operation are stored in theappropriate upstream site structure 182, and are presented to downstreamNDC sites later. Errors that occur outside of the direct connectionbetween the client, such as the client workstation 42, and the NDCserver terminator site 22 cannot affect the result of operationsperformed on the dataset by the client at the NDC client terminator site24. Upstream errors are processed the next time NDC sites along the pathexperiencing the error request access to the dataset.

The RLCCS Mechanism

To guarantee dataset consistency while simultaneously providing verygood response times to requests from clients, such as the clientworkstation 42, the present invention implements a concept calledReLocatable Consistency Control Sites (“RLCCS”) Under RLCCS, the firstNDC site along the path from the NDC server terminator site 22 to theNDC client terminator site 24 that detects a CWS condition becomes thedataset's CCS. If a CWS condition does not exist, there is no CCS sincethere is no dataset consistency issue that needs to be resolved.However, when a CWS condition arises, there can be only one NDC siteresponsible for maintaining the consistency between all projectedimages. This site will always be the first upstream NDC site that hasmultiple upstream connections.

RLCCS is the means by which the CCS is located in the most extendedposition possible to enable the maximum amount of non-distributedconsistency control. RLCCS ensures that the CCS is positioned to mostefficiently resolve dataset contention arising from a CWS condition.

RLCCS implements non-distributed cache consistency control strategy in afile level distributed cache. Instead of passing messages betweencaching sites, such as the NDC sites 26A and 202, to maintain aconsistent projection of the data cached at the various NDC sites, eachNDC site monitors the type of activity occurring at each of its upstreamNDC sites and disables caching at those sites when a CWS conditionoccurs.

If an NDC site determines that the activity at its upstream NDC sitescreates a CWS condition, the NDC site becomes the CCS for the file 156and issues recall/disable messages to all of its upstream NDC sites.Each upstream site, upon receiving a recall/disable message, recalls ordisables all of its upstream NDC sites before responding to the messagefrom the newly established CCS. After the recall activity completes, theCCS and all NDC sites downstream of the CCS are enabled for caching, andall NDC sites upstream of the CCS operate as conduits for file data thatis passing through them.

Relocation of the CCS, if it becomes necessary, is performed only whenthe CCS receives a request that creates a CWS condition. As describedbelow, there are two basic methods of relocating the CCS.

Upstream Relocation of the CCS

Upstream relocation moves the CCS to an NDC site that is closer to theclient, such as the client workstation 42, than the present CCS. A DTPresponse to a request to access data includes a “use ticket” thataccompanies data which is being passed upstream from NDC site to NDCsite. The DTP use ticket may be marked as USE_ONCE or USE_MANY dependingupon whether the image of the data may remain cached at an NDC siteafter it has been used to respond to the request that caused the data tobe fetched from downstream. The DTP use ticket for an image of data isalways marked as USE_MANY when it begins its journey from the NDC serverterminator site to the client site. However, as the image of the datapasses upstream from NDC site to NDC site, its use may be restricted toUSE_ONCE at any NDC site through which it passes. Thus, when the imageof the data passes through the current CCS for the file 156 the channel116 at that NDC site changes the data's DTP use ticket from USE_MANY toUSE_ONCE.

As the image of the file 156 is projected through successive NDC sites,if the DTP use ticket is marked as USE_MANY, the image of the data mayremained cached within the NDC buffers 129 assigned to the channel 116through which the image traverses the NDC site. Whether or not any dataremains cached within the NDC buffers 129 assigned to the channel 116after passing through the NDC site is determined solely by the localsite. Maintaining a projected image of data at an NDC site is a resourceallocation issue, and each NDC site must maintain control of its ownresources. However, if the DTP use ticket is marked USE_(—)0NCE, none ofthe data may remain cached within the NDC buffers 129 assigned to thechannel 116 after traversing the NDC site.

Upstream relocation of the CCS due to a decease notification requiresonly that the current CCS recognize if it no longer has multipleupstream NDC sites engaged in CWS activities. When that occurs, the NDCsite that formerly was the CCS merely stops marking the DTP use ticketUSE_ONCE. This change in the marking of the DTP use ticket immediatelypermits upstream NDC sites to begin caching any images of the file 156that may be projected into them in the future.

However, if one of the upstream NDC sites currently has additionalupstream NDC sites that are creating a CWS condition, that NDC site willdeclare itself to be the new CCS and begin changing the DTP use ticketfrom USE_MANY to USE_ONCE. In this way, the NDC 50 of the presentinvention facilitates relocating the CCS upstream.

Downstream Relocation of the CCS

Relocating the CCS downstream moves the CCS to an NDC site closer to theNDC server terminator site 22. Referring to FIG. 15, if no clients areaccessing the file 156 and then if a client on LAN 44B requests accessfor writing the file 156 residing on the NDC server terminator site 22,a projected image of the file 156 flows from NDC site 22, through NDCsites 26A, 202, 204A, and into NDC site 206. The client may now read andwrite the projection of the file 156 present in the NDC clientterminator site 206 with an unlimited number of simultaneous processeswithout the NDC client terminator site 206 checking with any of thedownstream NDC sites 204A, 202 or 26A, or with the NDC server terminatorsite 22 before each operation. The NDC client terminator site 206 needcommunicate with the downstream NDC sites 204A, 202, 26A and 22 only toload or unload data from the channel 116 at the NDC client terminatorsite 206.

If a client on LAN 44A connected to the NDC site 204B begins to accessthe file 156 for writing it, the NDC client terminator site 204B claimsa channel 116 that then sends an NDC_LOAD message to intermediate NDCsite 202. The NDC_LOAD message from the channel 116 will indicate thatNDC site 204B is loading data that will be overlaid by a writeoperation. Upon processing this NDC_LOAD message, the NDC site 202 findsthat a channel 116 already exists for the file 156. The existing channel116 identifies NDC site 204A as a current upstream NDC site, and alsoindicates that the channel 116 for the file 156 is currently enabled.This combination of conditions implies that the CCS for the file 156, ifone exists, is located either at NDC site 204A or at an NDC siteupstream from NDC site 204A. As described above, the upstream sitestructures 182 at the NDC site 202 not only identify all upstream NDCsites accessing the file 156, they also indicate the type of fileoperations that have occurred at each NDC site accessing the file 156.These few facts, i.e. the existence of a CWS condition and that the CCSis not currently located downstream from the NDC site 202 enable site202 to determine that it should declare itself the CCS.

While holding off the write request from the NDC site 204B, NDC site 202recalls or disables all upstream NDC sites that are caching projectedimages of the file 156. As described above, “disable” is sufficient forany NDC sites at which the file 156 was only being read. However, ifthere are any sites that have modified their image of the file 156,their dirty data must flushed back to the new CCS, NDC site 202.Therefore, NDC site 202 sends a recall message to NDC site 204A.

Before NDC site 204A responds to the recall message from NDC site 202,NDC site 204A transmits its own recall message upstream to NDC clientterminator site 206. After all of upstream NDC sites have responded tothe recall message from NDC site 204A, NDC site 204A will respond backto NDC site 202, forwarding any dirty data that had been soiled by NDCsite 204A, or by NDC sites upstream from NDC site 204A.

After NDC site 204A responds to the recall message from NDC site 202,NDC site 202 can begin processing the write request from NDC site 204B.NDC site 202 has now declared itself to be the CCS for file 156. NDCsite 202 is now in charge of sequencing all read/write operations thatare requested for the file 156 by its own clients, and by clients of allupstream NDC sites, e.g. NDC sites 204A, 204B and 206.

While the intermediate NDC site 202 remains the CCS with multipleconnections to upstream NDC sites 204A and 204B at least one of which iswriting the file 156, no file data or metadata will be cached upstreamof the intermediate NDC site 202. If, after all NDC sites that wereaccessing the file 156 for writing have disconnected from the file 156,the intermediate NDC site 202 as CCS still has one or more upstream NDCsites that are reading the file 156, the CCS will relocate upstream asdescribed above.

INDUSTRIAL APPLICABILITY

Within a networked digital computer system, file servers, workstations,gateways, bridges, and routers are all potential candidates to become anNDC site. The NDC 50 is a software module that can easily be ported todifferent environments. The NDC 50 requires a minimum of 250 k bytes ofRAM, of which 50 k is code and the remainder is allocated for variousdata structures and buffers. Each channel 116 occupies approximately 500bytes of RAM. Thus, one megabyte of RAM can accommodate about twothousand channels 116. At current memory prices, this amount of RAMcosts well under $50. As illustrated in FIG. 4, the structure for thesubchannel 118 included in each channel 116 provides pointers to 18 NDCbuffers 129. In the preferred embodiment of the invention, each NDCbuffer 129 stores 8 k bytes of projected data. Thus, the eighteen NDCbuffers 129 associated with each channel 116 can store an image of up to18*8 k bytes, i.e. 144 k bytes. Thus, with no additional subchannels152, each channel 116 can accommodate the complete projection, both ofdata and of NDC metadata, of any dataset of up to 144 k bytes in length.

An NDC site having only 250 k bytes RAM would be useful for only certainlimited applications. Each site usually allocates anywhere from 4 to 256megabytes of RAM for its NDC 50. For example, a 128 megabyte NDC sitethat allocates 32 megabytes of RAM for NDC data structures can maintainover 50,000 simultaneous connections to data conduits 62 while alsostoring 96 megabytes of data image projections. Because accessing largedatasets may require more than one channel 116, the number ofsimultaneous dataset connections will vary depending on the mix ofdatasets which are currently being accessed.

With so many channels 116 packed into a single NDC site, the task ofquickly connecting a new request to the channel 116 for the specifieddataset, or claiming the least recently used channel 116 if there isnone, might seem to be a daunting feat. However, the NDC 50 provides twomechanisms that facilitate solving this problem. The channel hash listsand the channel free list are methods of stringing together the channels116 in such a way that any particular channel 116, or the least recentlyused channel 116, can be quickly located. Moreover, preferably thenumber of hash buckets allocated at each NDC site is adjusted so that,on the average, there are 4 channels 116 in each hash bucket. Limitingthe number of channels 116 in each hash bucket to 4 permits quicklydetermining whether or not an NDC site presently has a channel 116assigned to accessing a particular dataset.

If the NDC client terminator site 24 receives a request from the clientworkstation 42 to access a dataset for which the NDC client terminatorsite 24 is also the NDC server terminator site, and if the request seeksto access data that is not currently being projected into the NDCbuffers 129 of the NDC site 24, the delay in responding to the firstrequest as measured at the client intercept routine 102 is approximately25 milliseconds (about the same as for NFS). However, once the NDC 50dispatches a response to the client workstation 42, the site will employintelligent, efficient, and aggressive read ahead to ensure that as longas the client workstation 42 continues to access the file sequentially,data will almost always be projected into the NDC buffers 129 of the NDCclient terminator site 24 before the client workstation 42 requests toaccess it. By pre-fetching data in this manner, responses to mostsubsequent requests from the client workstation 42 can be dispatchedfrom the NDC client terminator site 24 to the client workstation 42within 100 microseconds from the time the NDC site 24 receives therequest.

If the client workstation 42 requests to access a dataset that is at anNDC site other than the NDC client terminator site 24, such as NDC sites26B, 26A or 22, responding to the first request from the clientworkstation 42 requires an additional 25 millisecond delay for each NDCsite that must respond to the request. However, because the NDC clientterminator site 24 attempts to pre-fetch data for the client workstation42, the NDC site 24 will dispatch responses to subsequent requests fromthe client workstation 42 in about 100 microseconds as described above.

While the presently preferred embodiment of the NDC 50 is implemented insoftware, it may also be implemented in firmware by storing the routinesof the NDC 50 in a Read Only Memory (“ROM”). Furthermore, the operationof the NDC 50 is independent of any particular communication hardwareand protocol used to implement the LAN 44, and of the filesystem that isused for accessing the hard disks 32, 34 and 36. Analogously, theoperation of the NDC 50 is independent of the communication hardware andcommunication protocol by which DTP messages 52 pass between pairs ofNDC sites 22-26A, 26A-26B, or 26B-24. The communication hardware andprotocols for exchanging DTP messages 52 include backplane buses such asthe VME bus, local area networks such as Ethernet, and all forms oftelecommunication. Accordingly, DTP messages 52 exchanged between NDCsites may pass through gateways, including satellite data links, routersand bridges.

While the NDC 50 has been described thus far in the context of adistributed multi-processor computer system 20 in which various NDCsites, such as the sites 22, 26A, 26B and 24, are envisioned as beingseparated some distance from each other, the NDC 50 may also be appliedeffectively within a single computer system that incorporates a networkof computers. FIG. 16 depicts a file server referred to by the generalreference character 300. Those elements depicted in FIG. 16 that arecommon to the digital computer system 20 depicted in FIG. 1 carry thesame reference numeral distinguished by a double prime (“″”)designation. The file server 300 includes a host processor 302 forsupervising its overall operation. Within the file server 300, aninternal bus 304, perhaps a VME bus, couples the main host processor 302to a pair of storage processors 306A and 306B. The storage processors306A-B control the operation of a plurality of hard disks 32A″ through32F″. The internal bus 304 also couples a pair of file processors 312Aand 312B, a pair of shared primary memories 314A and 314B, and aplurality of Ethernet processors 316A through 316D to the host processor302, to the storage processors 306A-B, and to each other.

During the normal operation of the file server 300 without theincorporation of any NDCs 50, the Ethernet processors 316A-D receiverequests to access data stored on the disks 32A″ through 32F″ fromclients such as the client workstation 42 that is illustrated in FIG. 1.The requests received by the Ethernet processors 316A-D are transferredto one of the file processors 312A-B. Upon receiving a request to accessdata, the file processor 312A or 312B communicates with one of thestorage processors 306A or 306B via the internal bus 304 to effect thetransfer an image of the data from the disks 32A″ through 32F″ into theprimary memories 314A-B. After an image of the requested data has beentransferred into the primary memories 314A-B, the Ethernet processor 316that received the request then transmits the requested data to theclient thereby responding to the request.

The file processors 312A-B may incorporate a hard disk cache located inthe primary memories 314A-B. The presence of a hard disk cache in thefile server 300 allows it to respond to some r quests to access datawithout any communication between one of the file processors 312A-B andone of the storage processors 306A-B. However, even though the fileserver 300 includes a hard disk cache, during operation of the fileserver 300 responding to each request to access data received by theEthernet processors 316A-D necessarily involves communications betweenthe Ethernet processors 316A-D and the file processors 312A-B. That is,even though data needed by the Ethernet processors 316A-D for respondingto requests is already physically present in the primary memories314A-B, to gain access to the data the Ethernet processors 316A-D mustfirst communicate with the file processors 312A-B because the data isstored in a hard disk cache under the control of the file processors312A-B.

To enhance the overall performance of the file server 300, each of theEthernet processors 316A-D may incorporate an NDC 50 operating as NDCclient terminator site. Each NDCs 50 included in the Ethernet processors316A-D accesses a set of NDC buffers 129 allocated within the primarymemories 314A-B. In addition to the NDCs 50 included in the Ethernetprocessors 316A-D, the file server 300 may also include other NDCs 50operating as NDC server terminator sites in the file processors 312A-B.The NDCs 50 in the file processors 312A-B also access a set of NDCbuffers 129 allocated within the primary memories 314A-B.

In a file server 300 so incorporating NDCs 50, if one of the Ethernetprocessors 316A-D receives a request to access data that is alreadypresent in the NDC buffers 129 of its NDC 50, its NDC 50 may respondimmediately to the request without communicating with an NDC 50 locatedin one of the file processors 312A-B. Analogously, if one of theEthernet processors 316A-D receives a request to access data that is notpresent in its NDC buffers 129 but that is present in the NDC buffers129 of the NDCs 50 in the file processors 312A-B, those NDCs 50 may alsorespond immediately to the request without accessing the hard disk cachecontrolled by the file processors 312A-B. Under such circumstances, theNDC 50 operating in the file processors 312A-B may immediately respondto a request from the NDC 50 operating in the Ethernet processors 316A-Dmerely by providing it with a pointer to the location of the data withinthe primary memories 314A-B. Thus, by employing NDCs 50 both in theEthernet processors 316A-D and in the file processors 312A-B, data thatis physically present in NDC buffers 129 located in the primary memories314A-B becomes available more quickly to the Ethernet processors 316A-Dfor responding to requests from clients such as the client workstation42 by eliminating any need to access the hard disk cache controlled bythe file processors 312A-B.

Although the present invention has been described in terms of thepresently preferred embodiment, it is to be understood that suchdisclosure is purely illustrative and is not to be interpreted aslimiting. Consequently, without departing from the spirit and scope ofthe invention, various alterations, modifications, and/or alternativeapplications of the invention will, no doubt, be suggested to thoseskilled in the art after having read the preceding disclosure.Accordingly, it is intended that the following claims be interpreted asencompassing all alterations, modifications, or alternative applicationsas fall within the true spirit and scope of the invention

1-66. (canceled)
 67. A cache apparatus for use in a network wherein saidcache apparatus receives network file-services-protocol requests from aplurality of client workstations coupled to said network and responds tosaid requests, said cache apparatus comprising: a plurality of NetworkDistributed Cache (NDC) sites, at least one NDC site being an NDC clientterminator site for at least one of said plurality of clientworkstations, each NDC site being a server terminator site for onedataset of a plurality of datasets, each NDC site including an NDChaving: a storage memory and a cache memory for storing data to betransmitted in responding to said network file-services-protocolrequests, said storage memory storing said one dataset; a networkprocessor that couples said cache apparatus to said network wherein saidnetwork processor includes program instructions executed by saidprocessing unit for receiving said network file-services-protocolrequests and transmitting responses to said requests; a file processorincluding program instructions executed by said file processor forinterpreting said network file-services-protocol requests and generatingresponses to said requests including checking said cache memory todetermine if an image of data specified by said request is present, andwhen said data is present, retrieving said data to be included in saidresponse; and a storage processor including program instructionsexecuted by said storage processor for storing data requests receivedfrom said network and for generating network file-services-protocolrequests for data specified in said requests received by said fileprocessor and determined to be missing from said cache memory by saidfile processor, said storage processor requests being transmitted tosaid storage memory when said data specified in said requests is in saidone dataset and being transmitted to said network by said networkinterface when said data specified in said requests is not in said onedataset.
 68. A cache apparatus for use in a network wherein said cacheapparatus receives network file services-protocol requests from aplurality of client workstations coupled to said network and responds tosaid requests, said cache apparatus comprising: a storage memory and acache memory for storing data to be transmitted in responding to saidnetwork file-services-protocol requests, said storage memory storingsaid one dataset; a network processor that couples said cache apparatusto said network wherein program instructions executed by said networkprocessor receive said network file-services-protocol requests andtransmit network file-services-protocol responses to said requests; afile processor wherein program instructions executed by said fileprocessor in interpreting said network file-services-protocol requestsand generating network file-services-protocol responses to saidrequests, check said cache memory to determine if an image of dataspecified by said request is present, and when said data is present,retrieving said data to be included in said response; and a storageprocessor wherein program instructions executed by said storageprocessor store data requests received from said network and generatenetwork file-services protocol requests for data: a. specified in saidnetwork file-services-protocol requests received by said storageprocessor; and b. determined to be missing from said cache memory bysaid file-request-service module.
 69. A cache apparatus for use in anetwork wherein said cache apparatus receives networkfile-services-protocol requests from a plurality of client workstationscoupled to said network and responds to said requests, said cacheapparatus comprising: a storage memory for storing a dataset associatedwith the cache apparatus and a cache memory for storing data from thedataset and other data to be transmitted in responding to said networkfile-services-protocol requests, said storage memory storing said onedataset; a network processor that couples said cache apparatus to saidnetwork wherein said network processor includes program instructionsexecuted by said processing unit for receiving said networkfile-services-protocol requests and transmitting responses to saidrequests; a file processor including program instructions executed bysaid file processor for interpreting said network file-services-protocolrequests and generating responses to said requests including checkingsaid cache memory to determine if an image of data specified by saidrequest is present, and when said data is present, retrieving said datato be included in said response; and a storage processor includingprogram instructions executed by said storage processor for storing datarequests received from said network and for generating networkfile-services-protocol requests for data specified in said requestsreceived by said file processor and determined to be missing from saidcache memory by said file processor, said storage processor requestsbeing transmitted to said storage memory when said data specified insaid requests is in said dataset and being transmitted to said networkby said network interface when said data specified in said requests isnot in said dataset.